Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 not in vs in

Author  Topic 

esthera
Master Smack Fu Yak Hacker

1410 Posts

Posted - 2007-02-26 : 13:15:42
Shouldn't not in be the opposite as in

i'm doing in

select * from rep where d in(select distinct reg from dv)
this seems to work
but if i do
select * from rep where d not in(select distinct reg from dv)
-- it doesn't return the opposite of the first. (it's empty when really there are records that are not in the second table.

am i doing something wrong -- is there a better way to do this with a join to get all records from one table that are not in another table?

Kristen
Test

22859 Posts

Posted - 2007-02-26 : 13:26:27
If [d] is NULL then it won't match either of those two queries - could that be it?

Kristen
Go to Top of Page

esthera
Master Smack Fu Yak Hacker

1410 Posts

Posted - 2007-02-26 : 13:43:09
n d is never null
Go to Top of Page

mahesh_bote
Constraint Violating Yak Guru

298 Posts

Posted - 2007-02-27 : 05:27:51
quote:
Originally posted by esthera

Shouldn't not in be the opposite as in

i'm doing in

select * from rep where d in(select distinct reg from dv)
this seems to work
but if i do
select * from rep where d not in(select distinct reg from dv)
-- it doesn't return the opposite of the first. (it's empty when really there are records that are not in the second table.

am i doing something wrong -- is there a better way to do this with a join to get all records from one table that are not in another table?



who told u that, the result of NOT IN has to be opposite as IN? it depends on your data

Mahesh
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-02-27 : 08:42:17
The issue is not rep.D being null, it's when dv.reg is null.

What does

Select * from reg where dv is null

return ?

If it returns rows, then there's your problem.

Example:


select 'ok'
where 3 not in (1,null,2)

----

(0 row(s) affected)



Even though 3 is not equal to 1, 2 or Null, nothing is returned. That is because we cannot conclusively say that 3 is not equal to NULL, since NULL is unknown.

If rep.D is never null, then simply exclude NULL values from your subquery.

My preferred way to exclude certain values from a SELECT is with JOINS. IN general, whenever you can use a JOIN, it is clearer and usually performs better rather than embedding subqueries and long IN() expressions and so on.

select rep.*
from rep
left outer join
(select distinct reg from dv) x
on rep.d = x.reg
where x.reg is null


- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-02-27 : 08:58:00
How do you rate that compared to EXISTS?

If REG was unique in DV I would use a JOIN. If not I would use an EXISTS (to avoid getting multiple rows).

I'm kinda assuming that EXISTS only has to find the first entry to know that it exists, whereas SELECT DISTINCT is probably going to find them all, sort them, de-dupe the list, and go from there.

But I may be way off the ball over here ....

Kristen
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-02-27 : 09:16:57
quote:

I'm kinda assuming that EXISTS only has to find the first entry to know that it exists, whereas SELECT DISTINCT is probably going to find them all, sort them, de-dupe the list, and go from there.



I'd have to check the execution plan, in general I prefer joins because:

a) JOINS are good overall! Too many people never learn them and we see the infamous "WHERE IN ( WHERE EXISTS (WHERE NOT IN (...) ) ) )" pattern, which is horrible!

b) Compared to IN(), you can use more than 1 column to relate the tables, which you cannot do with IN. This, again, leads to issues where people don't understand that a table can have more than 1 column in the PK, leading to identities everywhere.

c) You can return any column you want from the join'ed table, unlike with an IN() or EXISTS() clause, if necessary.

But mainly:

d) I personally don't like the way that subqueries often need to "reference back" to the main query ; I feel that once something is in parens, the scope of that should remain separate from the outer query. (i.e. the way derived tables work, or functions or sub-classes in OOP). I just feel that it is cleaner and easier to follow and makes more sense than having a SELECT in an EXISTS() clause reference back out of the EXISTS() clause . Maybe that's the OOP programmer in me. (I can clarify this one if necessary -- i probably didn't explain that very well).

As for performance, remember that we can say that maybe EXISTS() can stop at the first thing it finds, but it needs to do that search over and over for *each* row in the main SELECT -- not just once. So, it may be no faster or slower than just getting the data all at once and sorting it and getting that ready to go. The optimizer probably evaluates things essentially the same anyway.

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-02-27 : 12:24:02
Thanks Jeff. Informative as always.
Go to Top of Page

esthera
Master Smack Fu Yak Hacker

1410 Posts

Posted - 2007-02-27 : 12:36:36
thanks for all your help.. my problem was fixed when I did select distinct.
any good tutorial you suggest for comparing the different kinds of joins?
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-02-27 : 14:14:24
I say -distinct- is an anti-pattern!

(and I am more inclined to use exists than join when it comes to set differentiating)

join, [not] exists, [not] in, except, intersect are just set operators that are available to us in 2005.
develop a feeling for manipulating sets and the choice will come naturally.

I don't know about any tutorials, but keep experimenting, and read in bol about the mentioned key terms.


rockmoose
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2007-02-28 : 00:34:41
EXISTS requires the use of a correlated subquery which can be a form of RBAR... how does that compare to the performance of a normal inner join?

--Jeff Moden
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2007-02-28 : 00:36:51
quote:
Originally posted by mahesh_bote



who told u that, the result of NOT IN has to be opposite as IN? it depends on your data

Mahesh



I should would like to see an example of a simple query where IN is not the opposite of NOT IN on a normalized table (no nulls). Unless that's the difference you were talking about...

--Jeff Moden
Go to Top of Page

esthera
Master Smack Fu Yak Hacker

1410 Posts

Posted - 2007-02-28 : 00:56:19
I say -distinct- is an anti-pattern!

what do you mean?
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-02-28 : 02:25:27
> EXISTS requires the use of a correlated subquery which can be a form of RBAR... how does that compare to the performance of a normal inner join?

RBAR...?
The plans have to be compared.
ime, EXISTS performs very well.
In this simple example, exists is the cheapest on my pc (2 less logical reads than join).
But don't quote me as the guy who said exists is ALWAYS better, I'm just saying that SQL Server is good at habdling queries with exists.
use northwind
go

set statistics io on

select * from customers
where customerid not in(select customerid from orders)

select * from customers
where customerid not in(select distinct customerid from orders)

select * from customers c
left join (select customerid from orders) o
on c.customerid = o.customerid
where o.customerid is null

select * from customers c
left join (select distinct customerid from orders) o
on c.customerid = o.customerid
where o.customerid is null

select * from customers c
where not exists(select * from orders o where c.customerid = o.customerid)


> I say -distinct- is an anti-pattern!
Many queries with distinct can be rewritten in a more performant way.
All too often is it thrown in due to lazyness or lack of proficiency in other techniques.
The essence is that results should be filtered as early as possible in a query, not at the end.

rockmoose


Go to Top of Page

mahesh_bote
Constraint Violating Yak Guru

298 Posts

Posted - 2007-02-28 : 03:31:51
quote:
Originally posted by Jeff Moden

quote:
Originally posted by mahesh_bote



who told u that, the result of NOT IN has to be opposite as IN? it depends on your data

Mahesh



I should would like to see an example of a simple query where IN is not the opposite of NOT IN on a normalized table (no nulls). Unless that's the difference you were talking about...

--Jeff Moden



Jeff, if u r talking about the NO NULL values, Normalized Table, it will give us the exact opposite results in case of NOT IN and IN. But tell me if ur table have NULLS, ur table in not normalized, does it possible to get exact opposite o/p in case of NOT IN v/s IN?

Mahesh
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-02-28 : 08:17:45
quote:
Originally posted by rockmoose
> I say -distinct- is an anti-pattern!
Many queries with distinct can be rewritten in a more performant way.
All too often is it thrown in due to lazyness or lack of proficiency in other techniques.
The essence is that results should be filtered as early as possible in a query, not at the end.



I could not agree more, it's a horrible pattern .... any time you see an even slightly complicated SELECT with a DISTINCT in it, a red flag is raised that says "this query is not written correctly."

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-02-28 : 16:23:30
"But tell me if ur table have NULLS, ur table in not normalized, does it possible to get exact opposite o/p in case of NOT IN v/s IN"

Yes, nulls and IN is very fishy...
Every time you use IN or NOT IN you have to know if NULL is involved...

select * from (select 1) as a(a) where a in(select 2 union select null)
select * from (select 1) as a(a) where a not in(select 2 union select null)

a
-----------

(0 row(s) affected)

a
-----------

(0 row(s) affected)


Or put another way, every time a column allows NULL, you have to be careful about how how you program, and cater for that special case...

rockmoose
Go to Top of Page

mahesh_bote
Constraint Violating Yak Guru

298 Posts

Posted - 2007-03-01 : 02:11:03
quote:
Originally posted by rockmoose

"But tell me if ur table have NULLS, ur table in not normalized, does it possible to get exact opposite o/p in case of NOT IN v/s IN"

Yes, nulls and IN is very fishy...
Every time you use IN or NOT IN you have to know if NULL is involved...

select * from (select 1) as a(a) where a in(select 2 union select null)
select * from (select 1) as a(a) where a not in(select 2 union select null)

a
-----------

(0 row(s) affected)

a
-----------

(0 row(s) affected)


Or put another way, every time a column allows NULL, you have to be careful about how how you program, and cater for that special case...

rockmoose




hay rock, thx dear.

Mahesh
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2007-03-01 : 08:19:38
quote:
Originally posted by jsmith8858

quote:
Originally posted by rockmoose
> I say -distinct- is an anti-pattern!
Many queries with distinct can be rewritten in a more performant way.
All too often is it thrown in due to lazyness or lack of proficiency in other techniques.
The essence is that results should be filtered as early as possible in a query, not at the end.



I could not agree more, it's a horrible pattern .... any time you see an even slightly complicated SELECT with a DISTINCT in it, a red flag is raised that says "this query is not written correctly."

Does that then not recommend the use of EXISTS if you want to check for the existance of a record in the many table? Use a JOIN and you will require DISTINCT.

I like exists. I have (with some very complex queries) found it perform better than a join. But in any event I typically prefer to avoid joining to tables from which I am selecting no data as I find it clutters up the query. If you see an exists statement you know exactly what the code is doing and can "segment" it from everything else. YMMV of course.
Go to Top of Page

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-03-01 : 08:37:42
There are pros and cons with every alternative. EXISTS eliminates NULL problem but it requires correlated subquery.

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2007-03-01 : 09:35:49
quote:
Originally posted by harsh_athalye

EXISTS eliminates NULL problem but it requires correlated subquery.


Not necessarily But what is the con anyway?
Go to Top of Page
    Next Page

- Advertisement -