Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-16 : 12:30:39
|
Not really sure how to explain this select statement, but I hope showing it will explain what I need and what part is causing it to be slow. select distinct t1.col2 from t1, t2 where t1.col3 = t2.col3 and t1.server_time >= t2.start_time and t1.server_time <= t2.end_time The compare I am having problems with is and t1.server_time >= t2.start_time and t1.server_time <= t2.end_time Does anyone know of a better way to do this?These fields are datetime type. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-02-16 : 12:37:43
|
| t1.server_time between t2.start_time and t2.end_time------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-16 : 12:43:41
|
| that did help, it took the time from 4 seconds down to 3 seconds. Is there anything else I could do, maybe indexing? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-02-16 : 12:45:58
|
| what are indexes that are present currently?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-16 : 12:53:20
|
Need to see the Query Plan really.UseSET STATISTICS IO ON; SET STATISTICS TIME ON... *** YOUR QUERY HERE *** ...SET STATISTICS IO OFF; SET STATISTICS TIME OFFGO and make a note of the Logical I.O and the number of Scans. Use that as a Base LineI presume you already have indexes on t1.col3 and t2.col3 (or they are already the PK + Clustered Index for those tables)Try adding indexes on1 = t1.server_time2 = t2.start_time 3 = t2.end_time or possible compound index on t2.start_time, end_time (instead of indexes 2 & 3)What is:select COUNT(*), COUNT(distinct t1.col2)for your query? If the total count is much bigger than the Distinct count then you might be better off using an EXISTS instead of a DISTINCT:select t1.col2from t1WHERE EXISTS( SELECT * FROM t2 WHERE t2.col3 = t1.col3 and t1.server_time >= t2.start_time and t1.server_time <= t2.end_time ) |
 |
|
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-16 : 13:22:32
|
Hi Kristen, thanks for the suggestions. Here is what I have tried since with the results of the scan.- posdata = t1- va_pos = t2SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.(1 row(s) affected)Table 'posdata'. Scan count 2, logical reads 4034, physical reads 0, read-ahead reads 0.Table 'va_pos'. Scan count 2, logical reads 237, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 3421 ms, elapsed time = 2665 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. * Added in indexes to t1.col3 and t2.col3* Already had indexes on 1 = t1.server_time 2 = t2.start_time 3 = t2.end_time (No Cluster)I tried using EXISTS in replace to distinct which didn't improve the times. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-16 : 13:47:26
|
OK, need to see the query plan then please:SET SHOWPLAN_TEXT ONGO... *** YOUR QUERY HERE *** ...SET SHOWPLAN_TEXT OFFGO |
 |
|
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-16 : 14:56:34
|
hope this is what you wanted.select count(distinct p.receipt_no) from posdata p, va_pos v where v.register_num = p.register_num and p.server_time between v.start_time and v.end_time and rule_name like '%gas%' and p.server_time >= '2010-02-15 00:00:30.000' and p.server_time <= '2010-02-15 12:00:00.000' |--Compute Scalar(DEFINE:([Expr1002]=Convert([globalagg1004]))) |--Stream Aggregate(DEFINE:([globalagg1004]=SUM([partialagg1003]))) |--Parallelism(Gather Streams) |--Stream Aggregate(DEFINE:([partialagg1003]=COUNT_BIG([p].[receipt_no]))) |--Hash Match(Aggregate, HASH:([p].[receipt_no]), RESIDUAL:([p].[receipt_no]=[p].[receipt_no])) |--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[receipt_no])) |--Hash Match(Inner Join, HASH:([p].[register_num])=( .[register_num]), RESIDUAL:(([p].[register_num]= .[register_num] AND [p].[server_time]>= .[start_time]) AND [p].[server_time]<= .[end_time])) |--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[register_num])) | |--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM')) |--Parallelism(Repartition Streams, PARTITION COLUMNS:( .[register_num])) |--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[va_pos].[PK_va_pos] AS ), WHERE:(like( .[rule_name], '%gas%', NULL))) |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-17 : 02:38:55
|
Yeah, well that query is not the same as what you posted originally. It would have saved time if you had posted your actual query. You are using rule_name like '%gas%' - the performance of that is not going to be improved.How many rows in each table?SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos]SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[posdata]maybe there are only relatively few rows in [va_pos] ? if so the '%gas%' issue is probably less important.Otherwise a couple of possible solutions:1) remove the LIKE test, nesting this query, and put the LIKE in an outer query - provided the number of rows within the dates is modest.2) use a different, simpler, way of getting [rule_name] values that include '%gas%' e.g.:AND rule_name IN (SELECT rule_name FROM SomeLookupTable WHERE rule_name like '%gas%')or AND rule_name IN ('Foo', 'Bar', ...)Separately:Do you have an index on [p].[server_time]? SQL is using the PK to find those values, which would be strange if an index existed.If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]If you cannot get the Query Plan to change from this:Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM'))to something likeClustered Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[Your server_time Index Name] ...then try creating the compound index anyway.Might be worth checking the query plan on this simple query first, to make sure that the index is used:SET SHOWPLAN_TEXT ONGOSELECT COUNT(DISTINCT p.receipt_no)FROM posdata AS pWHERE p.server_time >= '2010-02-15 00:00:30.000' and p.server_time <= '2010-02-15 12:00:00.000'GOSET SHOWPLAN_TEXT OFFGO |
 |
|
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-17 : 12:17:57
|
I only posted the part that I could tell was causing an issue. I am somewhat new to sql. :) When I ran the SELECT with out the server_time but with the LIKE it finished within a second. If ran just the between server_time part, it takes 40+ seconds. However, you are correct, next time I will post the entire select.va_pos = 28469 rowsposdata = 234394 rowsva_pos as much less rows, so the like doesn't seem to be an issue.1. With LIKE (4seconds) commented LIKE out (3seconds)in posdata, there are 15 columns, column [p].[idx] is the PK and [p].[server_time] is the 5th column with a default value of GetTime(). There are 3 indexes set up as follows.name - cluster - columnsPK_posdata - Yes - idx (As you mention, this might be odd)ix_dvr_id_servertime - No - server_time, dvr_idIDX_REGISTER - No - receipt_noI ran the last query plan, here is the result.SELECT COUNT(DISTINCT p.receipt_no) FROM posdata AS p WHERE p.server_time >= '2010-02-15 00:00:30.000' and p.server_time <= '2010-02-15 12:00:00.000' |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1004]))) |--Stream Aggregate(DEFINE:([Expr1004]=COUNT_BIG([p].[receipt_no]))) |--Sort(DISTINCT ORDER BY:([p].[receipt_no] ASC)) |--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000])) |--Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS [p]), SEEK:([p].[server_time] >= 'Feb 15 2010 12:00AM' AND [p].[server_time] <= 'Feb 15 2010 12:00PM') ORDERED FORWARD) |--Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[IDX_REGISTER] AS [p])) Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense.I am going to try and run the query into a temp table without the BETWEEN statement, then run it against the temp table to get the count. Kristen, thank you for all the help. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-17 : 13:03:57
|
"Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS ...So that query is using the Index and doing an Index SEEK rather than a SCAN - which is what you want.Just need to make the main query do that too." Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense."Presumably you are sufficiently close to the 2GB limit that adding more indexes might mean you run out of space.Can't help with that, either you pay for a copy of SQL server and then we can add indexes and do other things, or you use teh free MSDE version and then you are stuck with the performance that you have got!One other statistic please:SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos] WHERE rule_name like '%gas%' so we can see how many of the 28,469 rows are for "gas"The fact that the LIKE + Gas wildcards is not significantly slower is not, of itself, necessarily an indication on the main query. SQL Server may choose to run the main query differently because of the presence of that wildcard - i.e. if it was not there, or was there in a more efficient form, Query Planner would choose a different approach, and that might be significantly faster..But I think what we really want to see is index [ix_dvr_id_servertime] being used in the Query Plan for the main query |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-17 : 13:16:56
|
P.S. I have been assuming normal housekeeping is being done, but maybe not. Either way please run this and then re-run the Query Plan for the main query and see if it uses the [ix_dvr_id_servertime] indexUSE Vigil-DVRGOUPDATE STATISTICS dbo.va_pos WITH FULLSCANGOUPDATE STATISTICS dbo. posdata WITH FULLSCANGO |
 |
|
|
rkruis
Starting Member
28 Posts |
Posted - 2010-02-17 : 14:52:04
|
| Yes, unfortunately there is no chance in the near future we will upgrade from sql 2000 msde. And ya, we are close to the 2gb limit. The one table actually has closer to 2.5 million records. I am running this query on just the last few days of data copied to my db.Ran the full scan on both tables and the speed did not change. I know regular maintenance is being done. I looked at the Logical Scan Fragmentation on both tables are less than 5%.In my database, I have 28,469 records and 13,466 are gas related.What I am trying to accomplish with this select, is to get the count of rule_name(gas or diesel or ...) between a time period to show how many customers got gas, or diesel or whatever else could be logged in the rule_name field.What I have done, is I have this select inside a stored procedure with 3 parameters. One being Rule, other two being the between dates. Which returns a count found. I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on :) |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-18 : 03:07:10
|
What is the PK on va_pos? and on posdata?Better still would be the DDL Create Table / Index for both tables."If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]"Have you tried adding a compound index with both these columns as an experiment and seeing if that index gets used in the query plan? Only needs to be done as an experiment and can be dropped afterwards."I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on"Well you could force it with a HINT. Normally doesn't improve what the query planner does though.SELECT COUNT(distinct p.receipt_no)FROM posdata AS p JOIN va_pos WITH (INDEX(ix_dvr_id_servertime)) AS v ON v.register_num = p.register_numWHERE p.server_time between v.start_time and v.end_time and rule_name like '%gas%' and p.server_time >= '20100215 00:00:30.000' and p.server_time <= '20100215 12:00:00.000' Try it also with your test compound index's nameAlso, try a compound index also including [receipt_no] - i.e. [server_time], [register_num], [receipt_no]I also favour trying with the GAS test moved out of the main query - GAS is 50% of the records, the query planner is probably taking a lousy route to solving that (but it cannot use any Statistics to know that)SELECT COUNT(distinct p.receipt_no)( SELECT p.receipt_no, rule_name FROM posdata AS p JOIN va_pos AS v -- Try also WITH INDEX hint if you like ON v.register_num = p.register_num WHERE p.server_time between v.start_time and v.end_time AND p.server_time >= '20100215 00:00:30.000' AND p.server_time <= '20100215 12:00:00.000') AS XWHERE rule_name like '%gas%' |
 |
|
|
|
|
|
|
|