Author |
Topic |
sachinsamuel
Constraint Violating Yak Guru
383 Posts |
Posted - 2006-09-29 : 07:43:35
|
Hello Experts,Which one is better Join or Subquery? and Why?Thanks in advanceDon't sit back because of failure. It will come back to check if you still available. -- Binu |
|
Kristen
Test
22859 Posts |
Posted - 2006-09-29 : 07:44:43
|
Well it depends ....Describe the situation and problem please.Kristen |
 |
|
sachinsamuel
Constraint Violating Yak Guru
383 Posts |
Posted - 2006-09-29 : 08:37:20
|
I have 2 tables A - having 118040500 records B(its temporary table) - having 1500 records.When I say select * from A where a.id in (Select b.id from B)it takes almost 2 minutes to return data.How can I optimize this. Should I remove in with join?thanks in advanceDon't sit back because of failure. It will come back to check if you still available. -- Binu |
 |
|
pootle_flump
1064 Posts |
Posted - 2006-09-29 : 08:44:52
|
You would level the playing field a little if you used EXISTS for your subquery rather than IN. You might want to remove the * and replace with column names.Anyway - for a simple existance check Vs inner join the optimiser should come up with the same plan for both queries so it is probably a matter of preference.HTH |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2006-09-29 : 08:49:38
|
Set the execution plan and seeselect * from A where a.id in (Select b.id from B)select T1.* from A T1 inner join B T2 on T1.id=T2.idselect * from A T1 where exists(Select * from B where id=T1.id)MadhivananFailing to plan is Planning to fail |
 |
|
sachinsamuel
Constraint Violating Yak Guru
383 Posts |
Posted - 2006-09-29 : 08:58:32
|
I am getting same execution plan for all of them. Infact the logical reads are also same.please help.regardsSachinDon't sit back because of failure. It will come back to check if you still available. -- Binu |
 |
|
PSamsig
Constraint Violating Yak Guru
384 Posts |
Posted - 2006-09-29 : 18:04:23
|
It sounds like you dont have an index on A.id. How many rows does the query returns? If the answer is all 118040500 then I aint really surprised it takes 2 min ...-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter." |
 |
|
pootle_flump
1064 Posts |
|
sachinsamuel
Constraint Violating Yak Guru
383 Posts |
Posted - 2006-10-01 : 13:33:16
|
Thanks a lot experts.Unfortunatly I need all the columns there for I will have only option left is *.But I am still not clear with what is better... An inner join or a in clause?Please help.... this is urgent. Sorry to push.Thanks in advanceRegardsSachinDon't sit back because of failure. It will come back to check if you still available. -- Binu |
 |
|
PSamsig
Constraint Violating Yak Guru
384 Posts |
Posted - 2006-10-01 : 14:01:58
|
Since you your self answered I am getting same execution plan for all of them. Infact the logical reads are also same. then that clears that question. Now what is left is to determin if you ASK the right question, and under the right circumstances, but you have given zero to non information to help you there ...-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter." |
 |
|
pootle_flump
1064 Posts |
Posted - 2006-10-01 : 17:57:54
|
Assuming ID is indexed in both tables (ideally uniquely but that depends on the schema) then this is as fast as it can be unless you can sacrifice accuracy for performance. For example, you could use join hints or write the results of the query to a table at off peak times (especially if the contents of Table B don't change often). Maybe try using a table variable or derived table? Do you know if it is the population of the temp table or the joining to the primary table that takes up the majority of the execution time? Are you only dumping the ID into the temp table or are you also grabbing superfluous columns?Tell you what - why not post all the SQL?HTH |
 |
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-10-01 : 18:50:08
|
As a general rule you should use a join rather than an in (but the only way to be certain in a specific situation is to look at the execution plan, you cannot say one is always better). But, the key question here is, do you really need all the rows from the query? You do not have a where clause so you are returning all the rows from A that match anything in B. If there are a lot of rows and especially if most of the rows in A have an id in B then you aren't going to speed it up. Most likely though, you don't really want ALL of the rows that match EVERY row in B, so your query should be something like thisSELECT * FROM AINNER JOIN B ON A.id = B.idWHERE B.somecolumn = somevalue And - as a few people have already asked do you have an index on at least the id columns in A and B? |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-10-02 : 08:55:47
|
"As a general rule you should use a join rather than an in (but the only way to be certain in a specific situation is to look at the execution plan, you cannot say one is always better)."You did say general but I would add one caveat:If the JOIN can provide multiple hits you will probably want to use IN instead, to prevent having to use DISTINCT.Assuming you have the indexes mentioned above (and you haven't said one way or the other yet) then:select * from A where a.id in (Select b.id from B)A - having 118,040,500 records B(its temporary table) - having 1500 records.it takes almost 2 minutes to return data.how many rows are being returned? Because if its lots then it will take some time to physically ship them from the database to your application. Particularly if your SELECT * is covering a bung of TEXT columns!How long doesselect COUNT(*) from A where a.id in (Select b.id from B)take to run by comparison? That will remove the transfer time from the equation (but might use a smarter/shorter query plan that is not realistic for the main query, of course)Kristen |
 |
|
pootle_flump
1064 Posts |
Posted - 2006-10-02 : 09:27:34
|
You could try COUNT(AnUnindexedAndAlsoNeverNullColumn)to hobble the optimiser a bit |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-10-02 : 10:10:53
|
"COUNT(AnUnindexedAndAlsoNeverNullColumn)"COUNT(MyCol1+MyCol2+...) |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2006-10-02 : 10:51:33
|
>>COUNT(MyCol1+MyCol2+...) Wont that lead to error "Cant convert to int datatype" if columns are of different datatypes?MadhivananFailing to plan is Planning to fail |
 |
|
pootle_flump
1064 Posts |
Posted - 2006-10-02 : 11:30:58
|
... and null if one is null?COUNT(COALESCE(CAST(MyCol1 AS VarChar(8000)), '') + COALESCE(CAST(MyCol2 AS VarChar(8000)), , '') + ...) This could end up on DailyWTF.... |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2006-10-02 : 11:43:28
|
>>This could end up on DailyWTF.... MadhivananFailing to plan is Planning to fail |
 |
|
|