query optimization help for very large tables

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

query optimization help for very large tables

Author

Topic

rkhanna
Starting Member

3 Posts

Posted - 2007-11-01 : 00:09:46

I have the following table structure:
tableA (~85,000 rows) primary key = [colA,colB]
tableB (~850,000 rows) primary key = [colA,colC]
tableC (~120,000,000 rows) primary key = [colA,colB,colC]

IMPORTANT: colC is DATETIME

For a SET of rows in tableA (about 50,000) I need to pull the MOST RECENT (given a date) corresponding values from tables B and C. The only way I can think of doing this is the following:


SELECT tableA.colA
,(SELECT TOP 1 colX FROM tableB WHERE colA = tableA.colA AND colC <= @INPUTDATE ORDER BY colC desc)
,(SELECT TOP 1 colY FROM tableB WHERE colA = tableA.colA AND colC <= @INPUTDATE ORDER BY colC desc)
,... --some more columns from tableB
,(SELECT TOP 1 colX FROM tableC WHERE colA = tableA.colA AND colB = tableA.colB AND colC <= @INPUTDATE ORDER BY colC desc)
,(SELECT TOP 1 colY FROM tableC WHERE colA = tableA.colA AND colB = tableA.colB AND colC <= @INPUTDATE ORDER BY colC desc)
,... --some more columns from tableC
FROM tableA
WHERE tableA.colX = 'some criteria'

Is there any other way anyone can suggest? Unfortunately, because tableC is so large, the disk IO (I think) causes this query to take over an hour. (If I had monster RAM and super fast disk this wouldn't be as big an issue, but that's not an option right now

)

Thanks in advance!

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2007-11-01 : 02:25:55

Replace top 1 subqueries with the logic like this

SELECT Max(colX) FROM tableB WHERE colA = tableA.colA AND colC <= @INPUTDATE

Madhivanan

Failing to plan is Planning to fail

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2007-11-01 : 02:46:28

don't use sub-query. Change your query to using JOINS.

KH
[spoiler]Time is always against us[/spoiler]

Kristen
Test

22859 Posts

Posted - 2007-11-01 : 06:23:57

[code]
SELECT A.colA, B.colX, ..., C.ColZ, ...
FROM
(
SELECT C.colA, C.ColB, MAX(C.ColC) AS MAX_C
FROM TableC AS C
JOIN TableA AS A
WHERE A.colX = 'some criteria'
GROUP BY C.colA, C.ColB
) AS X
JOIN tableA AS A
ON A.ColA = X.ColA
JOIN tableB AS B
ON B.ColA = X.ColA
AND B.ColB = X.ColB
JOIN tableC AS C
ON C.ColA = X.ColA
AND C.ColB = X.ColB
AND C.ColC = X.MAX_C
WHERE A.colX = 'some criteria'
[/code]
For this to run efficiently you will need an index on TableA for ColX, ColA (if the Primary Key on TableA is clustered then ColA is [I think!!] automatically covered, in which case you won't need to include that)

Kristen

rkhanna
Starting Member

3 Posts

Posted - 2007-11-03 : 17:35:30

That worked great, thanks Kristen! I got it down to just a couple of minutes.

Kristen
Test

22859 Posts

Posted - 2007-11-04 : 02:12:01

Over an hour to a couple of minutes? That's good news indeed!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-05 : 02:50:21

Are you using SQL Server 2005?