Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Strange query speed issue?

Author  Topic 

jonasfh
Starting Member

9 Posts

Posted - 2007-09-07 : 11:16:50
Hi,

I have a problem with the speed of a query. The query:
select  * from data_values v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);


...fetches the last two minutes of data in the database. The data_values -table is large (~2 mill rows), while sensors -table and data_loggers table are 30 and 2 rows respectively. The query runs in about 1.5 seconds. However if I dont include the joins, like this:
select  * from data_values v
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

...its like 10 times faster. I've tried using this part as a subquery to sample from, and joining on the subquery, without success. However when I used transact-sql syntax to sort-of subquery, I managed to speed up the query, se below:

declare @s as table(data_value_id int, data_type_id int, sensor_id int, quality_id int, date_time datetime, lat_wgs84 float, lon_wgs84 float, height int, parallell int, data_value float);

insert into @s select * from data_values v where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

select * from @s v inner join sensors s on (v.sensor_id=s.sensor_id) inner join data_loggers l on (l.data_logger_id=s.data_logger_id );


Any suggestions on how I can speed up the query using normal sql? Or whats causing this? Have looked at the query analyzer, but didnt really get enlightened...
Another question: I'm using SQL Server Management Studio Express, and it only shows query speed in full seconds. Anyone knows a way to change this? Or to calculate the speed used by a query?

regards Jonas:-))

saad.ahnad@gmail.com
Starting Member

15 Posts

Posted - 2007-09-07 : 12:33:55
select *
from
(
select dateadd(minute, -2, max(date_time)) max_date from data_values
) max_date,
inner join data_values v on (v.date_time > max_date.max_date)
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )

And have an index on date_time

--
Saad Ahmad
saad.ahmad@gmail.com
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-09-07 : 14:44:00
select * from data_values v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

1) Don't use SELECT *, list out the columns
2) Always prefix all table names with "dbo." (or the owner, if different)
3) Make sure you have indexes on all Joined columns:
data_values.date_time
data_values.sensor_id
sensors.sensor_id
sensors.data_logger_id
data_loggers.data_logger_id

Personally I would try selecting the "2 minute" limit into a working variable before you start:

DECLARE @StartTime datetime
SELECT @StartTime = dateadd(minute, -2, max(date_time))
FROM dbo.data_values WITH (NOLOCK) -- Data integrity not critical here
...
WHERE date_time > @StartTime

Note that you should time the initial run, but also the average of 5 or more subsequent runs as the likelihood is that the query plan will be cached.

Kristen
Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-10 : 03:12:06
Sorry I wasn't quite clear: I didn't mean I didn't manage to make a subquery, it's just that the query was still running as slow as before...

Jonas:))

quote:
Originally posted by saad.ahnad@gmail.com

select *
from
(
select dateadd(minute, -2, max(date_time)) max_date from data_values
) max_date,
inner join data_values v on (v.date_time > max_date.max_date)
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )

And have an index on date_time

--
Saad Ahmad
saad.ahmad@gmail.com

Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-10 : 03:50:12
Hi Kristen,

thanks, it seems that simply by not using "select * ..." the query speed increased 10-fold. Do you have any explanation for this? I know I have read it before in regard to MySQL-databases, but that it should have such an enormous effect?

Anyway, thanks for your reply,

regards Jonas:))

quote:
Originally posted by Kristen

select * from data_values v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

1) Don't use SELECT *, list out the columns
2) Always prefix all table names with "dbo." (or the owner, if different)
3) Make sure you have indexes on all Joined columns:
data_values.date_time
data_values.sensor_id
sensors.sensor_id
sensors.data_logger_id
data_loggers.data_logger_id

Personally I would try selecting the "2 minute" limit into a working variable before you start:

DECLARE @StartTime datetime
SELECT @StartTime = dateadd(minute, -2, max(date_time))
FROM dbo.data_values WITH (NOLOCK) -- Data integrity not critical here
...
WHERE date_time > @StartTime

Note that you should time the initial run, but also the average of 5 or more subsequent runs as the likelihood is that the query plan will be cached.

Kristen

Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-10 : 04:59:08
Hi,
I've done some further investigation in the matter and found the "select * " was not the problem. I'll try to illustrate.
The code below, selecting all columnes from all joined tables, takes approx 1.4 sec to execute. I run the query several times so it is not the query cache that makes the difference. There are indexes on all joined columns. I have also tried having the where-clause as a sub-query (also shown below). Writing out all the columns instead of doing select * does not change the result:
Normal:

declare @d datetime;
set @d =(select getdate());
select * from data_values v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);
select datediff(millisecond,@d, getdate());


As subquery:

declare @d datetime;
set @d =(select getdate());
select * from (select * from data_values
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values)) v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id );
select datediff(millisecond,@d, getdate());


However, by forcing the join to be executed only on the result returned by the where-clause, by first executing this as a separate statement, the execution time drops from 1.4 seconds to approx 10 Milliseconds on the first execution, and to less that 1 millisecond after that. That is maybe a 10^4 times speed increase by this modification. Seems SQL-server is doing something seriously wrong here?


declare @d datetime;
set @d =(select getdate());
declare @s as table(data_value_id int, data_type_id int, sensor_id int, quality_id int, date_time datetime, lat_wgs84 float, lon_wgs84 float, height int, parallell int, data_value float);

insert into @s select * from data_values v where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

select * from @s v inner join sensors s on (v.sensor_id=s.sensor_id) inner join data_loggers l on (l.data_logger_id=s.data_logger_id );

select datediff(millisecond,@d, getdate());


Any way to force this behaviour without using the no-standard transact-SQL syntax that I'm doing now?

Regards Jonas:))
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-09-10 : 10:33:56
"Writing out all the columns instead of doing select * does not change the result"

Well it will, but I would not expect it to be very significant.

Listing all the columns saves SQL Server having to:

1) Work out what they are
2) Check if any new ones have been added since the previous query was Cached.
3) Prevents you sending new columns to your application which it never needed and doesn't now

Consider: You have a Customer Record. You retrieve it using SELECT *. In the future you add an Image of the customer, and a complete history of all the File Notes in a TEXT datatype column. previously your "Get Customer" routine was getting around 200 bytes per customer. Now its getting around 100K. BIG performance difference!

What was the impact of the other suggestions I made?

Kristen
Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-11 : 03:00:06
Hi again,

yeah, I see your point about "select *", but in my case the difference was neglectible. What turned out to make the big difference in the end was to take out the where-subquery, and calculate that first, like

declare @lastdate datetime;
set @lastdate=(select dateadd(minute , -5, max(date_time)) from
data_values);
select ... where date_time > @lastdate;

Don't really understand why SQL Server can't handle the subquery, though. Makes it kind of difficult to write standard portable SQL-code. I've been using a few other db-engines, and using a subquery seems to be kind of a typical way to minimize the data used in a join etc.

However, thanks for your help,

regards, Jonas:))

quote:
Originally posted by Kristen

"Writing out all the columns instead of doing select * does not change the result"

Well it will, but I would not expect it to be very significant.

Listing all the columns saves SQL Server having to:

1) Work out what they are
2) Check if any new ones have been added since the previous query was Cached.
3) Prevents you sending new columns to your application which it never needed and doesn't now

Consider: You have a Customer Record. You retrieve it using SELECT *. In the future you add an Image of the customer, and a complete history of all the File Notes in a TEXT datatype column. previously your "Get Customer" routine was getting around 200 bytes per customer. Now its getting around 100K. BIG performance difference!

What was the impact of the other suggestions I made?

Kristen

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-09-11 : 03:29:21
"but in my case the difference was neglectible."

Yes, but that's not the only point as I tried to explain.

What was the impact of the other suggestions I made?
Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-11 : 03:56:03
Well, I already had indexes on all joined columns, so that wasn't a problem. And I never tried adding "dbo." in front of all the names as I considered this also neglectible. Remember my query was not about saving milliseonds on a query, but had to do with a big difference between one way of asking the query, and another. Do you know if there is a way to speed up the query using normal SQL and not having to save the results of the subquery in a variable. I'd like to have mostly clean SQL as we might use different database-engines for different situations in our application.

regards Jonas:))

quote:
Originally posted by Kristen

"but in my case the difference was neglectible."

Yes, but that's not the only point as I tried to explain.

What was the impact of the other suggestions I made?

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-09-11 : 05:43:00
"I never tried adding "dbo." in front of all the names as I considered this also neglectible"

Why do you think that without trying it?

Yes, it probably is only saving MS, but I recommended it because it is good practice, and you should get into the habit of not using SELECT * and always prefixing tables etc. with DBO.

If you do not use the DBO prefix then SQL Server may not cache the query plan because it has to resolve whether their MIGHT be a user-specific owned table.

And if you don't adopt good practice then you are at risk that what you are building, and testing, is on un-firm ground and the suppositions you are making will not hold up over time as the system evolves and changes. I explained, as an example, what happens with SELECT * when you add additional columns ...

"using normal SQL and not having to save the results of the subquery in a variable"

You probably aren't going to speed it up if you want a generic any-SQL-server solution.

You could look at the Query Plan and that will probably show why embedding the calculation of the "last 5 minutes" is slow, it is probably being tackled at an inopportune time during the query process.

But personally I never put that sort of calculation as a sub-query as I reckon the optimiser is very rarely going to see it for what it is, and consequently we always pre-calculate such criteria.

Pre-calculating them also has the benefit that in debugging we can output the value which was actually used (i.e. as an intermediate step).

I actually proposed that you used WITH (NOLOCK) but that doesn't show in code sample you posted above, and I also would not have used the syntax:

SET @Variable = (SELECT Expression FROM Table ...)

because it is just obscuring what you are trying to do, and increasing the risk of introducing errors during future maintenance.

Kristen
Go to Top of Page

Jenda
Starting Member

29 Posts

Posted - 2007-09-11 : 06:50:23
quote:
Originally posted by Kristen

select * from data_values v
inner join sensors s on (v.sensor_id=s.sensor_id)
inner join data_loggers l on (l.data_logger_id=s.data_logger_id )
where date_time>(select dateadd(minute, -2, max(date_time)) from data_values);

3) Make sure you have indexes on all Joined columns:
data_values.date_time
data_values.sensor_id
sensors.sensor_id
sensors.data_logger_id
data_loggers.data_logger_id



And the index on data_values.sensor_id will be good for what? That index might be helpfull if in a different query the rows of data_values were filtered or sorted by this column, but if all queries look like this one ("find all rows in data_values that ... and then for each of them find the sensor"), then having an index on sensors.sensor_id is very helpfull, but the index on data_values_sensor_id is pointless.

The fact that a column is mentioned in a JOIN is not enough to make it a sure candidate for an index. It signifies that maybe it could be a good candidate, but you should not blindly index everything that ever appeared in a JOIN. Indexes are not free.
Go to Top of Page

jonasfh
Starting Member

9 Posts

Posted - 2007-09-11 : 07:23:13
Hmmm,
quote:

I actually proposed that you used WITH (NOLOCK) but that doesn't show in code sample you posted above, and I also would not have used the syntax:


I guess i skipped the WITH (NOLOCK) for the same reason I skipped the dbo., i didnt think it significant. Not meaning that locking isn't significant, just that it couldn't explain the huge difference in execution speed I was getting. Also it does not follow SQL92/99 which I try to follow, for portability. As long as the query uses less than a 1/10 of a second I'm not gonna spend time optimizing it, as my system is not that heavy loaded. I'd rather not use lots of non-standard features. If it's not possible to get reasonnable query-speeds with standard SQL I'll consider another system.

Jonas:)

Go to Top of Page
   

- Advertisement -