| Author |
Topic |
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 09:12:09
|
| Hi,We try to join two tables and in no way make it very fast. table_a has around 35 million rows. table_b has just 25. The basic structure as follows and the query is very simple.select ID,hotel from table_a inner join table_b on table_a.accID=table_b.accID whereprice<1000 and price>100flightdate<'20110930' and flightdate>'20110901'order by price.table_aID inthotel intaccID intprice decimal(9,2)flightdate datetable_baccID inttable_a has composite clustered index on price, accID fields.table_b has on accID.table_a has nonclustered index on price, flightdate and other fields are in INCLUDE.PROBLEM is:It is rather slow with order by price on accID join.If we change the clustered index to accID, price ; this time the join with other table becomes faster but order by price becomes real slow.The execution plans are always done by Index Scans or Seeks according to index force. But, still very slow.Is there something we can change or better make?p.s : It is stated on forums that we should always include order by in the queries and not trust clustered indexes eventhough it is indexed according to this field. If not applied, the engine can bring different rows each time according to execution plan choice. |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 09:19:44
|
| How selective is accID in table_aHow big is the resultset? You need an index on table_a.accIDIf it is selective then it can just be that column as it will use the clustered index to get the other columns which should be quick anyway.I would consider clustering on table_a.ID for this table as it will make other indexes more efficient.AccID include hotel price flightdateIt also depends on how many rows are returned from the filter on table_a==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 09:37:59
|
| table_a - it gives the table structures.I'm guessing that table_b is just to give the accID to return.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 09:40:24
|
Thanks for the help :)accID has distinct 1400 values. I think it is not selective when we consider 35m rows, right?We need only 100 rows but we have to order all according to price to get the cheapest ones at first. (actually, we will use row_number over(order by price) as well for web sorting, but didn't come to that level yet )price and flightdate field filters can even cover all table rows according to parameter search usage. So, no input range estimates here.as far as I understand, your offer is to use table_a.ID as clustered index and define table_a.accID as nonclustered one with INCLUDE hotel price flightdate.Will it not make order by price troublesome? |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 09:50:31
|
| The order by should not be a problem as you only have to sort your 100 row resultset.select ID,hotelfrom table_ajoin table_bon table_a.accID=table_b.accIDwhere price<1000and price>100and flightdate<'20110930'and flightdate>'20110901'ignore the order by for the momentwhat do these returnselect count(*)from table_ajoin table_bon table_a.accID=table_b.accIDselect count(*)from table_awhere price<1000and price>100and flightdate<'20110930'and flightdate>'20110901'select count(*)from table_awhere price<1000and price>100select count(*)from table_awhere flightdate<'20110930'and flightdate>'20110901'==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-07-06 : 09:50:52
|
| What I would suggest is 1) Create a Clustered Index on accID for table_a2)Have non clustered index on price and flightdate which covers ID,hotel 3)Create a non clustered Index on accID for table_bLooks like the order by for the column "price" might not be getting enough memory and spilling over to tempdb.Use the profiler and just select the Warning options so that it only shows the sort spills onto the tempdb while the query is running.Also can you post the query plan ?PBUH |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 10:00:01
|
| >> 1) Create a Clustered Index on accID for table_a????==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-07-06 : 10:08:03
|
| Because it will force the optimiser to go for a merge join on the inner table which will sort the data for the inner table.Does not make sense for keeping ID as a clustered index as it is no where being referenced in the above query.PBUH |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 10:12:57
|
| accID only has 1400 unique values so it's not clear whether the join to table_a is very selective or useful.Clustering on this would mean that a rowid needs to be includied in all other indexes making thyem less efficient and wasting space.Could be that the filters on table_a reduce the resultset and this is what needs to be optimised - in any case Iit's not cloear whether the join is actually an issue.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 10:14:23
|
@nigelrivettActually, I should sort first and get the cheapest 100 rows from 35m rows. Getting any 100 rows and sorting them is not what I want. I will send the counts now.@Sachin.NandI have simplified the tables here. But, I will get query plan from the real tables and send anyway.@nigelrivett and @Sachin.NandI will apply your index suggestions today and send the results.p.s: Index changes take too much time |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 10:20:28
|
| Putting the clustered index on ID will make it faster to create non-clustered indexes as well as making them more efficient.>> Actually, I should sort first and get the cheapest 100 rows from 35m rows. Getting any 100 rows and sorting them is not what I want. That's not what you postedDo you really haveselect top 100 ID,hotelfrom table_ajoin table_bon table_a.accID=table_b.accIDwhere price<1000and price>100and flightdate<'20110930'and flightdate>'20110901'order by priceIf so then how many rows are you sorting to get the result - i.e. how many rows without the top 100?Might be better to think of it (and maybe code it) as select top 100 ID,hotelfrom table_awhere price<1000 and price>100and flightdate<'20110930' and flightdate>'20110901'and accID in (select accID from table_b)order by priceNote your query will need to get ID, price order them get the top 100 then get the data for that 100 ID's (that's if you cluster on ID).You can then build non-clustered indexes to suit the filter and include price.To design the non-clustered indexes we need to know the number of rows returned by the filters.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 10:34:23
|
| The inner join returned 800K rows count. So, I should order them by price and get the top 100.Unfortunately, we can not guess the input for price and flightdate. As I stated, a user can select a month or a day for flightdate range or any price. p.s : This is a monthly table.So, this query would return all rows from the tableselect count(*)from table_awhere flightdate<'20110930'and flightdate>'20110901' |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 10:52:42
|
| In that case I would consider clustered index on ID, non_clustered on ACCID including flightdate, pricethen tryselect top 100 id, pricefrom table_awhere accID in (select accID from table_b)order by priceCheck the query plan to make sure it is using the index and doing a lookup on ACCID followed by a filterand see how long that takes - that should be close to the maximum time taken as the filters will be on that resultset using data included in the index.If that's ok it's just a matter of coding so that the index is used.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 11:32:33
|
| @nigelrivettYou are right. Now it is much faster under 1 sec, with order by price. (without price, filters yet)table_a (35m rows) uses nonclustered index seek. 8%table_b (25 rows) uses clustered index scan. 0%thennested loops (inner join) 4%sort (top n sort) 88%I have created ID as clustered and accID as nonclustered with all other columns in INCLUDE.I will check how where filters affect the results |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 11:37:30
|
| If they cause an issue you can separate the query into parts force the plan as you woulk like.Is this part of a stored procedure? It looks like something that feeds a report - an SP makes things a lot easier to handle.The saving is probably because it was previously trying to sort a wide resultset - now it just has price and id.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-06 : 11:56:44
|
| Yes, it is in an sp. I think we will dynamically force plans according to the filter input range. It looks like it makes changes in execution times as well. |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-06 : 12:41:48
|
| I would log the start and end of the sp and parameters so you can see if there are any problems and if they correspond to different parameters or other things going on at the time.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
ankie
Starting Member
2 Posts |
Posted - 2011-07-06 : 22:58:07
|
| Hi,I suspect the cause of the slowness is due to the order of your sql criteria. try this instead.select ID,hotel from table_a inner join table_b on table_a.accID=table_b.accIDwhere flightdate>'20110901' and flightdate<'20110930'and price>100 and price<1000order by priceor this:select ID,hotel from table_a inner join table_b on table_a.accID=table_b.accIDwhere price>100 and price<1000and flightdate>'20110901' and flightdate<'20110930'order by price |
 |
|
|
muratos
Starting Member
22 Posts |
Posted - 2011-07-07 : 02:51:37
|
| @ankieSql engine decides which order to follow according to statistics no matter how we write them. Although it is possible to force order by a hint, not recommended.OPTION (FORCE ORDER) |
 |
|
|
|