Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Same Execution Plans, Different Number of Rows

Author  Topic 

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 03:16:27
I have got two queries both of which generate the same execution plan:

query 1:
SELECT TOP 10 *
FROM news
CROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID) itemNet


query 2:
SELECT TOP 10 *
FROM news
CROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID AND ItemType = 0) itemNet


ItemNetwork table has 4 columns:
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ItemID] [bigint] NOT NULL,
[ItemType] [tinyint] NOT NULL,
[NetworkID] [int] NOT NULL


I have also created a non-clustered index on ItemNetwork table:
CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_ItemType__NetworkID] ON ItemNetwork
(
[ItemID] ASC,
[ItemType] ASC
)
INCLUDE ( [NetworkID])


The first query takes one second to execute, while it takes 2 minutes for the second one to execute. The execution plan for both queries is the same. You can see the execution plan for the first query:


and for the second query:



The only difference that can be seen between the two execution plans is the amount of data that comes out of news table. For the second query, we see a very big arrow coming out of news table. That is because the actual number of rows coming out of this table is 1534672 rows while for the first query, this number is 877 rows. For both queries, the estimated number of rows is 10 (because of top 10 clause). Look at the actual number of rows for both queries:




The only difference between the two queries is this condition:
ItemType = 0


The weird thing is that adding this condition, increases the number of output rows from News table while ItemType column is in ItemNetwork table!!

I also updated the statistics for all the tables involved, but it didn't make any difference.

Could somebody please tell me how I can make the second query execute as fast as the first one?

p.s. the total number of rows in News table is 1576612 rows, in Network table 1820 rows and in ItemNetwork table 42164 rows

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 03:28:30
Why not?
The CROSS APPLY works as INNER JOIN in this case.
If you remove the "ItemType = 0", you have to scan one table and seek the other.
When you include the "ItemType = 0", SQL Server is smart enough to only include the correct corresponding records in the NESTED LOOP JOIN.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 03:39:56
Please note that when I add ItemType = 0, I expect the query to execute faster, but it takes TWO MINUTES to execute when I add this condition! Do you know why? Why is it affecting the number of rows coming out of News table? ItmeType is a column in ItemNetwork table!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 03:45:04
Try this rewrite
SELECT TOP 10	*
FROM News AS n
INNER JOIN (
SELECT ItemID,
MAX(NetworkID)
FROM ItemNetwork WITH (INDEX (IX_ItemNetwork_ItemID_ItemType__NetworkID))
WHERE ItemType = 0
GROUP BY ItemID
) AS ItemNet ON ItemNet.ItemID = n.ID



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 03:45:56
Beware that using a wildcard character is bad or performance and almost always leads to a scan, since all columns are returned.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 04:12:28
Thank you Peso for your clever alternative query! It's working very nice

I just removed index hint:
WITH (INDEX (IX_ItemNetwork_ItemID_ItemType__NetworkID))

And it is still using the index.

I am also aware of wildcard (*) performance and will not use it in real world.

I just don't understand what was wrong with CROSS APPLY. I really can't figure out any reason for what was happening there!

Once again I appreciate your help
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 04:47:43
How long does the alternative query take to run?


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 05:12:31
quote:
Originally posted by Peso

How long does the alternative query take to run?



In milliseconds! very fast! And here is the execution plan of your query:



I also tried your query with CROSS APPLY and it produced the same execution plan and executed very fast:

SELECT TOP 10	*
FROM News AS n
CROSS APPLY (
SELECT MAX(NetworkID) AS NetworkID
FROM ItemNetwork
WHERE ItemID = n.ID AND ItemType = 0
GROUP BY ItemID
) AS ItemNet
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 05:15:27
You know the strange thing is that the Index Scan on News table has turned into Index Seek!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 05:19:26
No, that is not strange.
The index IX_ItemNetwork_ItemID_ItemType__NetworkID covers all columns needed in the derived table ItemNet.

The fact remains that the query before took 2 minutes with correct result, and now only milliseconds?


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 05:24:52
The "TOP 1" statement before randomly selected one record in the ItemNetwork table, so you probably should replace "MAX(NetworkID)" with "MIN(NetworkID)", depending on the clustered index (if any) on the ItemNetwork table, to get same result as with "TOP 1".


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 05:40:39
quote:
Originally posted by Peso
The fact remains that the query before took 2 minutes with correct result, and now only milliseconds?



Look at the 4th image. Almost all rows are fetched out from News table. That's why it took two minutes even though the query was correct. I don't know what is the reason for that.
Go to Top of Page

waterduck
Aged Yak Warrior

982 Posts

Posted - 2009-06-25 : 06:02:34
Nice to see,
Nice to learn,
Once forgotten,
Remember sqlteam.com
(still in sleepy mode)
Go to Top of Page

cmdcenter
Starting Member

15 Posts

Posted - 2009-06-25 : 07:29:52
I think I have figured out the problem!

Here is what has happened:

Every News item, may belong to one or more Networks. So I related them in ItemNetwork table.

I used a Data Generator to populate News table with almost 1,500,000 records. But I forgot to assign a dummy network to each of them in ItemNetwork table. That's why ItemNetwork table had only 42164 records (many news items had no network)

So here is what I did now: I assigned at least one network to each news item. Therefore, ItemNetwork table now has at least 1,500,000 rows.

Now the query that I initially was complaining about, is running very fast!

Here are four equivalent queries:

SELECT TOP 10	*
FROM NewsItem AS n
INNER JOIN (
SELECT ItemID,
MAX(NetworkID) NetworkID
FROM ItemNetwork
WHERE ItemType = 0
GROUP BY ItemID
) AS ItemNet ON ItemNet.ItemID = n.ID

-------------------------------------
SELECT TOP 10 *
FROM NewsItem AS n
CROSS APPLY (
SELECT MAX(NetworkID) NetworkID
FROM ItemNetwork
WHERE ItemID = n.ID AND ItemType = 0
GROUP BY ItemID
) AS ItemNet

-------------------------------------
SELECT TOP 10 *
FROM NewsItem AS n
CROSS APPLY (
SELECT MIN(NetworkID) NetworkID
FROM ItemNetwork
WHERE ItemID = n.ID AND ItemType = 0
) AS ItemNet
-------------------------------------
SELECT TOP 10 *
FROM NewsItem news
CROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID AND ItemType = 0) itemNet


and execution plans respectively:




As you can see the first two queries (24%) execute faster than the second two ones (26%)

I just do not understand one thing. Look at the 2nd query. As you may agree, there is a redundant clause in it which can be removed and I have done it in query 3:

GROUP BY ItemID

The fact is that query 2 runs faster than query 3.
Can we conclude that sometimes, doing something that does not seem necessary (in this case, the GROUP BY clause), can increase performance? I am not sure but as I remember, we should always avoid GROUP BY whenever possible. right?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-25 : 07:33:36
You should also compare the four queries with SQL Profiler according to CPU, duration, reads and writes.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page
   

- Advertisement -