Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Table scan question -STILL OPEN-FINALTRY

Author  Topic 

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2010-03-04 : 13:15:49
Can someone please explain what the best procedure is.



Declare @Dates table (MyDate datetime)
Declare @Invoice table (DateReceived datetime,InvoiceDate datetime)



insert into @Dates
select '1/1/2009' Union all
select '2/1/2009' Union all
select '3/1/2009' Union All
select '4/1/2009' Union all
select '5/1/2009' Union all
select '6/1/2009' Union all
select '7/1/2009' Union all
select '8/1/2009' Union all
select '9/1/2009' Union all
select '10/1/2009' Union all
select '11/1/2009' Union all
select '12/1/2009' Union all
select '1/1/2010' Union all
select '2/1/2010'


insert into @Invoice
select '1/1/2009','1/1/2009' Union all
select '2/1/2009','2/1/2009' Union all
select '8/1/2009','3/1/2009' Union all
select '8/1/2009','4/1/2009' Union all
select '8/1/2009','5/1/2009' Union all
select '2/1/2010','2/1/2010'


select * from
(
select Row_Number() over (Partition by b.DateReceived order by b.invoicedate) as RowID,*
from
@Invoice b
) a
Left Join
@Dates b
on a.RowiD =1
and a.DateReceived = b.MyDate



select * from
(
select Row_Number() over (Partition by b.DateReceived order by b.invoicedate) as RowID,*
from
@Invoice b
) a
Left Join
@Dates b
on a.DateReceived = b.MyDate
where a.RowiD =1



Obviously the first query does not work properly, and I know why.

My question is if I use the second query, is the sql engine going to filter out all rowID's greater than 1 prior to linking, or is it going to link and once the results are returned then filter out the data.

The reason why I ask is in the production query there are Millions's of records that willhave rowID's greater than 1, and that is a lot of extra work on the server if sql is going to process this way.

Is the proper way to write the above query



Select *
from
(
select * from
(
select Row_Number() over (Partition by a.DateReceived order by a.invoicedate) as RowID,*
from
@Invoice a
) aa
where aa.RowID = 1
) aa
Left Join
@Dates bb
on aa.DateReceived = bb.MyDate


This is a general practice question, so please don't just say check the execution plan. I am interested in just knowing as a general practice.

Thanks


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881

DP978
Constraint Violating Yak Guru

269 Posts

Posted - 2010-03-04 : 14:52:15
Almost positive that in any case you are going to get the entire table read once to find the row numbers. That does not seem like something that would be stored along side the table, no matter how well indexed or organized.

Pretty good question though, although you are going to have to wait for an expert to comment further :)
Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2010-03-06 : 21:33:36
The only way I can think of to avoid this issue is to create a tmp table.



select * from
into #TMP -- in a live scenerio I would index this properly.
(
select Row_Number() over (Partition by a.DateReceived order by a.invoicedate) as RowID,*
from
@Invoice a
) aa
where aa.RowID = 1


Select *
from
#TMP aa
Left Join
@Dates bb
on aa.DateReceived = bb.MyDate


Is there anyway to achieve these results without creating a TMP table so there is not unessasary work being put on the server like in the examples form my original post?



Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-03-08 : 16:00:04
Hard to say for absolute certainty what the optimizer will do. But, there should not be any difference in execution between your two scenarios (filtering then joining and joining then filtering). The optimizer *should* be smart enough to apply the filter before the join.
Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2010-03-08 : 16:03:16
Would the best practice be to use the TMP table scenerio then to ensure the path taken?


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-03-08 : 16:11:39
Unfortunatly, there are not always absoluts. I would NOT use a temp table unless there was performance to be gained from using one. In general, a temp table is not going to help your performance on a well crafted DB. But, there are times when they can be of bennefit. They only way to tell for sure is to try both ways. But, I would go with without the temp table unless performance is bad.
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-09 : 04:22:17
also -- making a temp table each time will almost certainly mean the query plan will be recompiled every time you run.

Maybe you may want to change the select * in the innermost derived table

select Row_Number() over (Partition by a.DateReceived order by a.invoicedate) as RowID,*
from
@Invoice a

Into a SELECT of the row_number and the primary key only for that table and then join that to the parent table. You'll always get at least one scan though because of the row number (you have to scan all the rows to make up the ranking).

You could also put this ranking derived table into a CTE or VIEW but you'll always get at least one scan I think.


Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page
   

- Advertisement -