| Author |
Topic |
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 03:16:27
|
I have got two queries both of which generate the same execution plan:query 1:SELECT TOP 10 *FROM newsCROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID) itemNet query 2:SELECT TOP 10 *FROM newsCROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID AND ItemType = 0) itemNet ItemNetwork table has 4 columns:[ID] [bigint] IDENTITY(1,1) NOT NULL,[ItemID] [bigint] NOT NULL,[ItemType] [tinyint] NOT NULL,[NetworkID] [int] NOT NULL I have also created a non-clustered index on ItemNetwork table:CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_ItemType__NetworkID] ON ItemNetwork( [ItemID] ASC, [ItemType] ASC)INCLUDE ( [NetworkID]) The first query takes one second to execute, while it takes 2 minutes for the second one to execute. The execution plan for both queries is the same. You can see the execution plan for the first query: and for the second query: The only difference that can be seen between the two execution plans is the amount of data that comes out of news table. For the second query, we see a very big arrow coming out of news table. That is because the actual number of rows coming out of this table is 1534672 rows while for the first query, this number is 877 rows. For both queries, the estimated number of rows is 10 (because of top 10 clause). Look at the actual number of rows for both queries: The only difference between the two queries is this condition:ItemType = 0 The weird thing is that adding this condition, increases the number of output rows from News table while ItemType column is in ItemNetwork table!!I also updated the statistics for all the tables involved, but it didn't make any difference.Could somebody please tell me how I can make the second query execute as fast as the first one?p.s. the total number of rows in News table is 1576612 rows, in Network table 1820 rows and in ItemNetwork table 42164 rows |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 03:28:30
|
Why not?The CROSS APPLY works as INNER JOIN in this case.If you remove the "ItemType = 0", you have to scan one table and seek the other.When you include the "ItemType = 0", SQL Server is smart enough to only include the correct corresponding records in the NESTED LOOP JOIN. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 03:39:56
|
| Please note that when I add ItemType = 0, I expect the query to execute faster, but it takes TWO MINUTES to execute when I add this condition! Do you know why? Why is it affecting the number of rows coming out of News table? ItmeType is a column in ItemNetwork table! |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 03:45:04
|
Try this rewriteSELECT TOP 10 *FROM News AS nINNER JOIN ( SELECT ItemID, MAX(NetworkID) FROM ItemNetwork WITH (INDEX (IX_ItemNetwork_ItemID_ItemType__NetworkID)) WHERE ItemType = 0 GROUP BY ItemID ) AS ItemNet ON ItemNet.ItemID = n.ID E 12°55'05.63"N 56°04'39.26" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 03:45:56
|
Beware that using a wildcard character is bad or performance and almost always leads to a scan, since all columns are returned. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 04:12:28
|
Thank you Peso for your clever alternative query! It's working very nice I just removed index hint: WITH (INDEX (IX_ItemNetwork_ItemID_ItemType__NetworkID))And it is still using the index. I am also aware of wildcard (*) performance and will not use it in real world.I just don't understand what was wrong with CROSS APPLY. I really can't figure out any reason for what was happening there!Once again I appreciate your help |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 04:47:43
|
How long does the alternative query take to run? E 12°55'05.63"N 56°04'39.26" |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 05:12:31
|
quote: Originally posted by Peso How long does the alternative query take to run?
In milliseconds! very fast! And here is the execution plan of your query: I also tried your query with CROSS APPLY and it produced the same execution plan and executed very fast:SELECT TOP 10 *FROM News AS nCROSS APPLY ( SELECT MAX(NetworkID) AS NetworkID FROM ItemNetwork WHERE ItemID = n.ID AND ItemType = 0 GROUP BY ItemID ) AS ItemNet |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 05:15:27
|
| You know the strange thing is that the Index Scan on News table has turned into Index Seek! |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 05:19:26
|
No, that is not strange.The index IX_ItemNetwork_ItemID_ItemType__NetworkID covers all columns needed in the derived table ItemNet.The fact remains that the query before took 2 minutes with correct result, and now only milliseconds? E 12°55'05.63"N 56°04'39.26" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 05:24:52
|
The "TOP 1" statement before randomly selected one record in the ItemNetwork table, so you probably should replace "MAX(NetworkID)" with "MIN(NetworkID)", depending on the clustered index (if any) on the ItemNetwork table, to get same result as with "TOP 1". E 12°55'05.63"N 56°04'39.26" |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 05:40:39
|
quote: Originally posted by PesoThe fact remains that the query before took 2 minutes with correct result, and now only milliseconds?
Look at the 4th image. Almost all rows are fetched out from News table. That's why it took two minutes even though the query was correct. I don't know what is the reason for that. |
 |
|
|
waterduck
Aged Yak Warrior
982 Posts |
Posted - 2009-06-25 : 06:02:34
|
| Nice to see,Nice to learn,Once forgotten,Remember sqlteam.com(still in sleepy mode) |
 |
|
|
cmdcenter
Starting Member
15 Posts |
Posted - 2009-06-25 : 07:29:52
|
I think I have figured out the problem!Here is what has happened:Every News item, may belong to one or more Networks. So I related them in ItemNetwork table.I used a Data Generator to populate News table with almost 1,500,000 records. But I forgot to assign a dummy network to each of them in ItemNetwork table. That's why ItemNetwork table had only 42164 records (many news items had no network)So here is what I did now: I assigned at least one network to each news item. Therefore, ItemNetwork table now has at least 1,500,000 rows.Now the query that I initially was complaining about, is running very fast!Here are four equivalent queries:SELECT TOP 10 *FROM NewsItem AS nINNER JOIN ( SELECT ItemID, MAX(NetworkID) NetworkID FROM ItemNetwork WHERE ItemType = 0 GROUP BY ItemID ) AS ItemNet ON ItemNet.ItemID = n.ID ------------------------------------- SELECT TOP 10 *FROM NewsItem AS nCROSS APPLY ( SELECT MAX(NetworkID) NetworkID FROM ItemNetwork WHERE ItemID = n.ID AND ItemType = 0 GROUP BY ItemID ) AS ItemNet ------------------------------------- SELECT TOP 10 *FROM NewsItem AS nCROSS APPLY ( SELECT MIN(NetworkID) NetworkID FROM ItemNetwork WHERE ItemID = n.ID AND ItemType = 0 ) AS ItemNet ------------------------------------- SELECT TOP 10 *FROM NewsItem newsCROSS APPLY (SELECT TOP 1 NetworkID FROM ItemNetwork WHERE ItemID = news.ID AND ItemType = 0) itemNet and execution plans respectively: As you can see the first two queries (24%) execute faster than the second two ones (26%)I just do not understand one thing. Look at the 2nd query. As you may agree, there is a redundant clause in it which can be removed and I have done it in query 3:GROUP BY ItemIDThe fact is that query 2 runs faster than query 3.Can we conclude that sometimes, doing something that does not seem necessary (in this case, the GROUP BY clause), can increase performance? I am not sure but as I remember, we should always avoid GROUP BY whenever possible. right? |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-25 : 07:33:36
|
You should also compare the four queries with SQL Profiler according to CPU, duration, reads and writes. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
|