select between two tables slow

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

select between two tables slow

Author

Topic

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 12:30:39

Not really sure how to explain this select statement, but I hope showing it will explain what I need and what part is causing it to be slow.


	select distinct t1.col2
	from t1, t2
	where t1.col3 = t2.col3
	and t1.server_time >= t2.start_time 
	and t1.server_time <= t2.end_time

The compare I am having problems with is


	and t1.server_time >= t2.start_time 
	and t1.server_time <= t2.end_time

Does anyone know of a better way to do this?
These fields are datetime type.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-02-16 : 12:37:43

t1.server_time between t2.start_time and t2.end_time

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 12:43:41

that did help, it took the time from 4 seconds down to 3 seconds.
Is there anything else I could do, maybe indexing?

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-02-16 : 12:45:58

what are indexes that are present currently?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Kristen
Test

22859 Posts

Posted - 2010-02-16 : 12:53:20

Need to see the Query Plan really.

Use


SET STATISTICS IO ON; SET STATISTICS TIME ON

... *** YOUR QUERY HERE *** ...

SET STATISTICS IO OFF; SET STATISTICS TIME OFF
GO

and make a note of the Logical I.O and the number of Scans. Use that as a Base Line

I presume you already have indexes on t1.col3 and t2.col3 (or they are already the PK + Clustered Index for those tables)

Try adding indexes on

1 = t1.server_time
2 = t2.start_time
3 = t2.end_time

or possible compound index on t2.start_time, end_time (instead of indexes 2 & 3)

What is:

select COUNT(*), COUNT(distinct t1.col2)

for your query? If the total count is much bigger than the Distinct count then you might be better off using an EXISTS instead of a DISTINCT:


select t1.col2
from t1
WHERE EXISTS
(
    SELECT *
    FROM t2
    WHERE t2.col3 = t1.col3
        and t1.server_time >= t2.start_time 
        and t1.server_time <= t2.end_time 
)

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 13:22:32

Hi Kristen, thanks for the suggestions. Here is what I have tried since with the results of the scan.

- posdata = t1
- va_pos = t2


SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(1 row(s) affected)

Table 'posdata'. Scan count 2, logical reads 4034, physical reads 0, read-ahead reads 0.
Table 'va_pos'. Scan count 2, logical reads 237, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 3421 ms,  elapsed time = 2665 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

* Added in indexes to t1.col3 and t2.col3
* Already had indexes on 1 = t1.server_time 2 = t2.start_time 3 = t2.end_time (No Cluster)

I tried using EXISTS in replace to distinct which didn't improve the times.

Kristen
Test

22859 Posts

Posted - 2010-02-16 : 13:47:26

OK, need to see the query plan then please:


SET SHOWPLAN_TEXT ON
GO

... *** YOUR QUERY HERE *** ...

SET SHOWPLAN_TEXT OFF
GO

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 14:56:34

hope this is what you wanted.


select count(distinct p.receipt_no)   from posdata p, va_pos v    where v.register_num = p.register_num      and p.server_time between v.start_time and v.end_time      and rule_name    like '%gas%'   and   p.server_time >= '2010-02-15 00:00:30.000'    and p.server_time <= '2010-02-15 12:00:00.000'


  |--Compute Scalar(DEFINE:([Expr1002]=Convert([globalagg1004])))
       |--Stream Aggregate(DEFINE:([globalagg1004]=SUM([partialagg1003])))
            |--Parallelism(Gather Streams)
                 |--Stream Aggregate(DEFINE:([partialagg1003]=COUNT_BIG([p].[receipt_no])))
                      |--Hash Match(Aggregate, HASH:([p].[receipt_no]), RESIDUAL:([p].[receipt_no]=[p].[receipt_no]))
                           |--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[receipt_no]))
                                |--Hash Match(Inner Join, HASH:([p].[register_num])=(.[register_num]), RESIDUAL:(([p].[register_num]=.[register_num] AND [p].[server_time]>=.[start_time]) AND [p].[server_time]<=.[end_time]))
                                     |--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[register_num]))
                                     |    |--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM'))
                                     |--Parallelism(Repartition Streams, PARTITION COLUMNS:(.[register_num]))
                                          |--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[va_pos].[PK_va_pos] AS ), WHERE:(like(.[rule_name], '%gas%', NULL)))

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 02:38:55

Yeah, well that query is not the same as what you posted originally. It would have saved time if you had posted your actual query. You are using rule_name like '%gas%' - the performance of that is not going to be improved.

How many rows in each table?

SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos]
SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[posdata]

maybe there are only relatively few rows in [va_pos] ? if so the '%gas%' issue is probably less important.

Otherwise a couple of possible solutions:

1) remove the LIKE test, nesting this query, and put the LIKE in an outer query - provided the number of rows within the dates is modest.

2) use a different, simpler, way of getting [rule_name] values that include '%gas%' e.g.:

AND rule_name IN (SELECT rule_name FROM SomeLookupTable WHERE rule_name like '%gas%')

or

AND rule_name IN ('Foo', 'Bar', ...)

Separately:

Do you have an index on [p].[server_time]? SQL is using the PK to find those values, which would be strange if an index existed.

If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]

If you cannot get the Query Plan to change from this:

Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM'))

to something like

Clustered Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[Your server_time Index Name] ...

then try creating the compound index anyway.

Might be worth checking the query plan on this simple query first, to make sure that the index is used:


SET SHOWPLAN_TEXT ON
GO
SELECT COUNT(DISTINCT p.receipt_no)
FROM posdata AS p
WHERE     p.server_time >= '2010-02-15 00:00:30.000'
      and p.server_time <= '2010-02-15 12:00:00.000'
GO
SET SHOWPLAN_TEXT OFF
GO

rkruis
Starting Member

28 Posts

Posted - 2010-02-17 : 12:17:57

I only posted the part that I could tell was causing an issue. I am somewhat new to sql. :) When I ran the SELECT with out the server_time but with the LIKE it finished within a second. If ran just the between server_time part, it takes 40+ seconds. However, you are correct, next time I will post the entire select.

va_pos = 28469 rows
posdata = 234394 rows

va_pos as much less rows, so the like doesn't seem to be an issue.

1. With LIKE (4seconds) commented LIKE out (3seconds)

in posdata, there are 15 columns, column [p].[idx] is the PK and [p].[server_time] is the 5th column with a default value of GetTime().
There are 3 indexes set up as follows.
name - cluster - columns
PK_posdata - Yes - idx (As you mention, this might be odd)
ix_dvr_id_servertime - No - server_time, dvr_id
IDX_REGISTER - No - receipt_no

I ran the last query plan, here is the result.


SELECT COUNT(DISTINCT p.receipt_no)  FROM posdata AS p  WHERE     p.server_time >= '2010-02-15 00:00:30.000'        and p.server_time <= '2010-02-15 12:00:00.000'

  |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1004])))
       |--Stream Aggregate(DEFINE:([Expr1004]=COUNT_BIG([p].[receipt_no])))
            |--Sort(DISTINCT ORDER BY:([p].[receipt_no] ASC))
                 |--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
                      |--Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS [p]), SEEK:([p].[server_time] >= 'Feb 15 2010 12:00AM' AND [p].[server_time] <= 'Feb 15 2010 12:00PM') ORDERED FORWARD)
                      |--Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[IDX_REGISTER] AS [p]))

Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense.

I am going to try and run the query into a temp table without the BETWEEN statement, then run it against the temp table to get the count.

Kristen, thank you for all the help.

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 13:03:57

"Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS ...

So that query is using the Index and doing an Index SEEK rather than a SCAN - which is what you want.

Just need to make the main query do that too.

" Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense."

Presumably you are sufficiently close to the 2GB limit that adding more indexes might mean you run out of space.

Can't help with that, either you pay for a copy of SQL server and then we can add indexes and do other things,
or you use teh free MSDE version and then you are stuck with the performance that you have got!

One other statistic please:


SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos] WHERE rule_name like '%gas%'

so we can see how many of the 28,469 rows are for "gas"

The fact that the LIKE + Gas wildcards is not significantly slower is not, of itself, necessarily an indication on the main query.
SQL Server may choose to run the main query differently because of the presence of that wildcard - i.e. if it was not there,
or was there in a more efficient form, Query Planner would choose a different approach, and that might be significantly faster..

But I think what we really want to see is index [ix_dvr_id_servertime] being used in the Query Plan for the main query

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 13:16:56

P.S. I have been assuming normal housekeeping is being done, but maybe not.
Either way please run this and then re-run the Query Plan for the main query and see if it uses the [ix_dvr_id_servertime] index


USE Vigil-DVR
GO

UPDATE STATISTICS dbo.va_pos WITH FULLSCAN
GO
UPDATE STATISTICS dbo. posdata WITH FULLSCAN
GO

rkruis
Starting Member

28 Posts

Posted - 2010-02-17 : 14:52:04

Yes, unfortunately there is no chance in the near future we will upgrade from sql 2000 msde. And ya, we are close to the 2gb limit. The one table actually has closer to 2.5 million records. I am running this query on just the last few days of data copied to my db.

Ran the full scan on both tables and the speed did not change. I know regular maintenance is being done. I looked at the Logical Scan Fragmentation on both tables are less than 5%.

In my database, I have 28,469 records and 13,466 are gas related.

What I am trying to accomplish with this select, is to get the count of rule_name(gas or diesel or ...) between a time period to show how many customers got gas, or diesel or whatever else could be logged in the rule_name field.

What I have done, is I have this select inside a stored procedure with 3 parameters. One being Rule, other two being the between dates. Which returns a count found.

I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on :)

Kristen
Test

22859 Posts

Posted - 2010-02-18 : 03:07:10

What is the PK on va_pos? and on posdata?

Better still would be the DDL Create Table / Index for both tables.

"If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]"

Have you tried adding a compound index with both these columns as an experiment and seeing if that index gets used in the query plan? Only needs to be done as an experiment and can be dropped afterwards.

"I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on"

Well you could force it with a HINT. Normally doesn't improve what the query planner does though.


SELECT COUNT(distinct p.receipt_no)
FROM posdata AS p
    JOIN va_pos WITH (INDEX(ix_dvr_id_servertime)) AS v
        ON v.register_num = p.register_num
WHERE     p.server_time between v.start_time and v.end_time
      and rule_name like '%gas%'
      and p.server_time >= '20100215 00:00:30.000'
      and p.server_time <= '20100215 12:00:00.000'

Try it also with your test compound index's name

Also, try a compound index also including [receipt_no] - i.e. [server_time], [register_num], [receipt_no]

I also favour trying with the GAS test moved out of the main query - GAS is 50% of the records, the query planner is probably taking a lousy route to solving that (but it cannot use any Statistics to know that)


SELECT COUNT(distinct p.receipt_no)
(
	SELECT	p.receipt_no, rule_name
	FROM	posdata AS p
		JOIN va_pos  AS v -- Try also WITH INDEX hint if you like
			ON v.register_num = p.register_num
	WHERE	    p.server_time between v.start_time and v.end_time
		AND p.server_time >= '20100215 00:00:30.000'
		AND p.server_time <= '20100215 12:00:00.000'
) AS X
WHERE	rule_name like '%gas%'

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources