| Author |
Topic |
|
pelegk2
Aged Yak Warrior
723 Posts |
Posted - 2010-03-17 : 15:43:40
|
| i have a table with 170million rows (tableA),and a second table which have about 500,000 rows(tableB).i want to compare a certain column from tableA (type bigint)to check if a value of it exist in tableB.i tried to do this using inner join, making a select with IN expression, but it runs for hours.any idea how i can improve te result?(the column from tableA dosent have an index, and the column from tableB is defined as primary key)ThanksPeleg |
|
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2010-03-17 : 15:57:50
|
I would create the needed index. No, you're never too old to Yak'n'Roll if you're too young to die. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-03-17 : 16:19:40
|
" i tried to do this using inner join, making a select with IN expression, but it runs for hours."Is it a JOIN or is it an IN clause?An IN clause may make a Sort and DISTINCT on the whole of TableB A JOIN would not have to do that, and if the value is the PK in TableB then a BookMark Lookup, or equivalent, should eb very quick (even if the Value column is not indexes in TableA)(In writing that I realise that the Value Column is the PK for TableB, and thus QueryN Optimiser should not consider needing to do ORDER BY and DISTINCT as the values in TableB must be unique ... Best to look at the Query Plan and see what's what).I have known EXISTS and IN produce disastrous query plans, where JOIN does not.But JOIN will return multiple records if there are duplicates on TableB of course - which EXISTS and IN will not do. |
 |
|
|
pelegk2
Aged Yak Warrior
723 Posts |
Posted - 2010-03-18 : 04:42:29
|
| i used the Exists and it seems to help,the problem is that i split the compare into bulk of 1 million rows each times(which take 1.18 min), but when i try to make a query with a parameter (@minid), the time increases select distinct ReasonRef from Con_RecipientToMessages (nolock)where Con_RecipientToMessages.Id>=@minid and Con_RecipientToMessages.Id<= @minid+1000000and exists (select ReasonRef from #tblReasonList(nolock) where Con_RecipientToMessages.ReasonRef=#tblReasonList.ReasonRef)what can i do with the query where i use a variable to improve performance? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-03-18 : 05:02:35
|
Does a JOIN perform differently?select distinct RTM.ReasonReffrom Con_RecipientToMessages (nolock) AS RTM JOIN #tblReasonList(nolock) AS RL ON RL.ReasonRef = RTM.ReasonRef where RTM.Id >= @minid and RTM.Id <= @minid+1000000 Which is the smaller table? If I understood your first message correctly #tblReasonList is the smaller table and ReasonRef is the PK?If so thenSELECT RL.ReasonRefFROM #tblReasonList(nolock) AS RLWHERE EXISTS( SELECT * FROM Con_RecipientToMessages (nolock) AS RTM WHERE RTM.ReasonRef = RL.ReasonRef AND RTM.Id >= @minid AND RTM.Id <= @minid+1000000) as this avoids the Distinct.I don't like to see those NOLOCK there - serious possibility of highly undesirable side effects. Especially on the #tblReasonList - no other process can be referencing that, other than you, surely??If you have blocking / deadlock issues use READ_COMMITTED_SNAPSHOT instead. It has none of the disastrous side effects of NOLOCK and all the upside - u9nless you have some special circumstances. |
 |
|
|
pelegk2
Aged Yak Warrior
723 Posts |
Posted - 2010-03-18 : 05:23:38
|
| Kristen first thanks for the tips,the thing that i want from this query is check if exist any value from one table inthe other table (it dosen't realy matter the value itselfor how many will be find)for the ReasonRef - it's not a PK, but current i am building an index,so maybe it will improve the performance. |
 |
|
|
pelegk2
Aged Yak Warrior
723 Posts |
Posted - 2010-03-18 : 06:03:22
|
| Thanks Kristenthe second query+the index improved it to 50-110 seconds search over 1million rows:)the thing is that with parameters the query is still slower - any idea? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-03-18 : 06:16:55
|
| Get rid of those NOLOCKs please! |
 |
|
|
pelegk2
Aged Yak Warrior
723 Posts |
Posted - 2010-03-18 : 06:50:17
|
quote: Originally posted by Kristen Get rid of those NOLOCKs please!
from the temporary table (i did so) or from the second one as well? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-03-18 : 07:38:51
|
| UP to you.Are you happy that some rows from Con_RecipientToMessages may not be included at all?That is possible with NOLOCK, and that possibility is the case everywhere you use it - so if you have that in your code in lots of places they all have that risk. NOLOCK can also cause a row to be included twice (won't be a problem here as you are using DISTINCT)Why do you use it? Like I said, if you having blocking/deadlock problems then use READ_COMMITTED_SNAPSHOT instead. |
 |
|
|
|