Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 select between two tables slow

Author  Topic 

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 12:30:39
Not really sure how to explain this select statement, but I hope showing it will explain what I need and what part is causing it to be slow.

select distinct t1.col2
from t1, t2
where t1.col3 = t2.col3
and t1.server_time >= t2.start_time
and t1.server_time <= t2.end_time

The compare I am having problems with is

and t1.server_time >= t2.start_time
and t1.server_time <= t2.end_time

Does anyone know of a better way to do this?
These fields are datetime type.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-02-16 : 12:37:43
t1.server_time between t2.start_time and t2.end_time

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 12:43:41
that did help, it took the time from 4 seconds down to 3 seconds.
Is there anything else I could do, maybe indexing?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-02-16 : 12:45:58
what are indexes that are present currently?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-16 : 12:53:20
Need to see the Query Plan really.

Use

SET STATISTICS IO ON; SET STATISTICS TIME ON

... *** YOUR QUERY HERE *** ...

SET STATISTICS IO OFF; SET STATISTICS TIME OFF
GO

and make a note of the Logical I.O and the number of Scans. Use that as a Base Line

I presume you already have indexes on t1.col3 and t2.col3 (or they are already the PK + Clustered Index for those tables)

Try adding indexes on

1 = t1.server_time
2 = t2.start_time
3 = t2.end_time

or possible compound index on t2.start_time, end_time (instead of indexes 2 & 3)

What is:

select COUNT(*), COUNT(distinct t1.col2)

for your query? If the total count is much bigger than the Distinct count then you might be better off using an EXISTS instead of a DISTINCT:

select t1.col2
from t1
WHERE EXISTS
(
SELECT *
FROM t2
WHERE t2.col3 = t1.col3
and t1.server_time >= t2.start_time
and t1.server_time <= t2.end_time
)
Go to Top of Page

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 13:22:32
Hi Kristen, thanks for the suggestions. Here is what I have tried since with the results of the scan.

- posdata = t1
- va_pos = t2

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)

Table 'posdata'. Scan count 2, logical reads 4034, physical reads 0, read-ahead reads 0.
Table 'va_pos'. Scan count 2, logical reads 237, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:
CPU time = 3421 ms, elapsed time = 2665 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.


* Added in indexes to t1.col3 and t2.col3
* Already had indexes on 1 = t1.server_time 2 = t2.start_time 3 = t2.end_time (No Cluster)

I tried using EXISTS in replace to distinct which didn't improve the times.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-16 : 13:47:26
OK, need to see the query plan then please:

SET SHOWPLAN_TEXT ON
GO

... *** YOUR QUERY HERE *** ...

SET SHOWPLAN_TEXT OFF
GO
Go to Top of Page

rkruis
Starting Member

28 Posts

Posted - 2010-02-16 : 14:56:34
hope this is what you wanted.

select count(distinct p.receipt_no) from posdata p, va_pos v where v.register_num = p.register_num and p.server_time between v.start_time and v.end_time and rule_name like '%gas%' and p.server_time >= '2010-02-15 00:00:30.000' and p.server_time <= '2010-02-15 12:00:00.000'



|--Compute Scalar(DEFINE:([Expr1002]=Convert([globalagg1004])))
|--Stream Aggregate(DEFINE:([globalagg1004]=SUM([partialagg1003])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1003]=COUNT_BIG([p].[receipt_no])))
|--Hash Match(Aggregate, HASH:([p].[receipt_no]), RESIDUAL:([p].[receipt_no]=[p].[receipt_no]))
|--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[receipt_no]))
|--Hash Match(Inner Join, HASH:([p].[register_num])=(.[register_num]), RESIDUAL:(([p].[register_num]=.[register_num] AND [p].[server_time]>=.[start_time]) AND [p].[server_time]<=.[end_time]))
|--Parallelism(Repartition Streams, PARTITION COLUMNS:([p].[register_num]))
| |--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM'))
|--Parallelism(Repartition Streams, PARTITION COLUMNS:(.[register_num]))
|--Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[va_pos].[PK_va_pos] AS ), WHERE:(like(.[rule_name], '%gas%', NULL)))
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 02:38:55
Yeah, well that query is not the same as what you posted originally. It would have saved time if you had posted your actual query. You are using rule_name like '%gas%' - the performance of that is not going to be improved.

How many rows in each table?

SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos]
SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[posdata]

maybe there are only relatively few rows in [va_pos] ? if so the '%gas%' issue is probably less important.

Otherwise a couple of possible solutions:

1) remove the LIKE test, nesting this query, and put the LIKE in an outer query - provided the number of rows within the dates is modest.

2) use a different, simpler, way of getting [rule_name] values that include '%gas%' e.g.:

AND rule_name IN (SELECT rule_name FROM SomeLookupTable WHERE rule_name like '%gas%')

or

AND rule_name IN ('Foo', 'Bar', ...)

Separately:

Do you have an index on [p].[server_time]? SQL is using the PK to find those values, which would be strange if an index existed.

If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]

If you cannot get the Query Plan to change from this:

Clustered Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[PK_posdata] AS [p]), WHERE:([p].[server_time]>='Feb 15 2010 12:00AM' AND [p].[server_time]<='Feb 15 2010 12:00PM'))

to something like

Clustered Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[Your server_time Index Name] ...

then try creating the compound index anyway.

Might be worth checking the query plan on this simple query first, to make sure that the index is used:

SET SHOWPLAN_TEXT ON
GO
SELECT COUNT(DISTINCT p.receipt_no)
FROM posdata AS p
WHERE p.server_time >= '2010-02-15 00:00:30.000'
and p.server_time <= '2010-02-15 12:00:00.000'
GO
SET SHOWPLAN_TEXT OFF
GO
Go to Top of Page

rkruis
Starting Member

28 Posts

Posted - 2010-02-17 : 12:17:57
I only posted the part that I could tell was causing an issue. I am somewhat new to sql. :) When I ran the SELECT with out the server_time but with the LIKE it finished within a second. If ran just the between server_time part, it takes 40+ seconds. However, you are correct, next time I will post the entire select.

va_pos = 28469 rows
posdata = 234394 rows

va_pos as much less rows, so the like doesn't seem to be an issue.

1. With LIKE (4seconds) commented LIKE out (3seconds)

in posdata, there are 15 columns, column [p].[idx] is the PK and [p].[server_time] is the 5th column with a default value of GetTime().
There are 3 indexes set up as follows.
name - cluster - columns
PK_posdata - Yes - idx (As you mention, this might be odd)
ix_dvr_id_servertime - No - server_time, dvr_id
IDX_REGISTER - No - receipt_no

I ran the last query plan, here is the result.

SELECT COUNT(DISTINCT p.receipt_no) FROM posdata AS p WHERE p.server_time >= '2010-02-15 00:00:30.000' and p.server_time <= '2010-02-15 12:00:00.000'

|--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1004])))
|--Stream Aggregate(DEFINE:([Expr1004]=COUNT_BIG([p].[receipt_no])))
|--Sort(DISTINCT ORDER BY:([p].[receipt_no] ASC))
|--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
|--Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS [p]), SEEK:([p].[server_time] >= 'Feb 15 2010 12:00AM' AND [p].[server_time] <= 'Feb 15 2010 12:00PM') ORDERED FORWARD)
|--Index Scan(OBJECT:([Vigil-DVR].[dbo].[posdata].[IDX_REGISTER] AS [p]))


Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense.

I am going to try and run the query into a temp table without the BETWEEN statement, then run it against the temp table to get the count.

Kristen, thank you for all the help.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 13:03:57
"Index Seek(OBJECT:([Vigil-DVR].[dbo].[posdata].[ix_dvr_id_servertime] AS ...

So that query is using the Index and doing an Index SEEK rather than a SCAN - which is what you want.

Just need to make the main query do that too.

" Talking with the guy who wrote this orignally, he says since we use msde, we have a 2gb limit which has been causing us issues with indexing. If that makes any sense."

Presumably you are sufficiently close to the 2GB limit that adding more indexes might mean you run out of space.

Can't help with that, either you pay for a copy of SQL server and then we can add indexes and do other things,
or you use teh free MSDE version and then you are stuck with the performance that you have got!

One other statistic please:

SELECT COUNT(*) FROM [Vigil-DVR].[dbo].[va_pos] WHERE rule_name like '%gas%'

so we can see how many of the 28,469 rows are for "gas"

The fact that the LIKE + Gas wildcards is not significantly slower is not, of itself, necessarily an indication on the main query.
SQL Server may choose to run the main query differently because of the presence of that wildcard - i.e. if it was not there,
or was there in a more efficient form, Query Planner would choose a different approach, and that might be significantly faster..

But I think what we really want to see is index [ix_dvr_id_servertime] being used in the Query Plan for the main query
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-17 : 13:16:56
P.S. I have been assuming normal housekeeping is being done, but maybe not.
Either way please run this and then re-run the Query Plan for the main query and see if it uses the [ix_dvr_id_servertime] index

USE Vigil-DVR
GO

UPDATE STATISTICS dbo.va_pos WITH FULLSCAN
GO
UPDATE STATISTICS dbo. posdata WITH FULLSCAN
GO
Go to Top of Page

rkruis
Starting Member

28 Posts

Posted - 2010-02-17 : 14:52:04
Yes, unfortunately there is no chance in the near future we will upgrade from sql 2000 msde. And ya, we are close to the 2gb limit. The one table actually has closer to 2.5 million records. I am running this query on just the last few days of data copied to my db.

Ran the full scan on both tables and the speed did not change. I know regular maintenance is being done. I looked at the Logical Scan Fragmentation on both tables are less than 5%.

In my database, I have 28,469 records and 13,466 are gas related.

What I am trying to accomplish with this select, is to get the count of rule_name(gas or diesel or ...) between a time period to show how many customers got gas, or diesel or whatever else could be logged in the rule_name field.

What I have done, is I have this select inside a stored procedure with 3 parameters. One being Rule, other two being the between dates. Which returns a count found.

I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on :)
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-02-18 : 03:07:10
What is the PK on va_pos? and on posdata?

Better still would be the DDL Create Table / Index for both tables.

"If [register_num] is not the PK on [posdata], or it is the PK but is not using a Clustered Index, then it needs a compound index [server_time], [register_num]"

Have you tried adding a compound index with both these columns as an experiment and seeing if that index gets used in the query plan? Only needs to be done as an experiment and can be dropped afterwards.

"I understand the difference between a seek and a scan, but to get the main query to seek, I have no idea on"

Well you could force it with a HINT. Normally doesn't improve what the query planner does though.

SELECT COUNT(distinct p.receipt_no)
FROM posdata AS p
JOIN va_pos WITH (INDEX(ix_dvr_id_servertime)) AS v
ON v.register_num = p.register_num
WHERE p.server_time between v.start_time and v.end_time
and rule_name like '%gas%'
and p.server_time >= '20100215 00:00:30.000'
and p.server_time <= '20100215 12:00:00.000'

Try it also with your test compound index's name

Also, try a compound index also including [receipt_no] - i.e. [server_time], [register_num], [receipt_no]

I also favour trying with the GAS test moved out of the main query - GAS is 50% of the records, the query planner is probably taking a lousy route to solving that (but it cannot use any Statistics to know that)

SELECT COUNT(distinct p.receipt_no)
(
SELECT p.receipt_no, rule_name
FROM posdata AS p
JOIN va_pos AS v -- Try also WITH INDEX hint if you like
ON v.register_num = p.register_num
WHERE p.server_time between v.start_time and v.end_time
AND p.server_time >= '20100215 00:00:30.000'
AND p.server_time <= '20100215 12:00:00.000'
) AS X
WHERE rule_name like '%gas%'
Go to Top of Page
   

- Advertisement -