select where not in (...) - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

select where not in (...)

Author

Topic

ias0nas
Starting Member

36 Posts

Posted - 2007-10-12 : 04:43:05

Hello,

I am using SQL Server 2005.

I have this query:
Select count(distinct username) from accesscategorylink, sitesuser
where accesscategoryid in ('482')
and accesscategorylink.userid = sitesuser.sitesuserid
and sitesuser.acceptnewsletters = '1' and userid not in (
Select iduser from sentnewsletter where sentnewsletter.idnewsletter = '92' and sentnewsletter.downloaded = '1')

This query gives me a time-out.

The embeded select returns only one value. If I remove the last clause the query runs fine in less than a second:
Select count(distinct username) from accesscategorylink, sitesuser
where accesscategoryid in ('482')
and accesscategorylink.userid = sitesuser.sitesuserid
and sitesuser.acceptnewsletters = '1'

Also, If I run the embeded select that returns one value it also excecutes in less than a second.
Select iduser from sentnewsletter where sentnewsletter.idnewsletter = '92' and sentnewsletter.downloaded = '1'

So I am only having the problem with the full query.
Any ideas..? is there a way of replacing the not in (..) ?

Thank you

Kristen
Test

22859 Posts

Posted - 2007-10-12 : 05:06:33

I don't suppose it will make a different, but you could try NOT EXISTS instead:


and userid not in (
Select iduser from sentnewsletter where sentnewsletter.idnewsletter = '92' and sentnewsletter.downloaded = '1')
change to
and NOT EXISTS (
Select * from sentnewsletter where sentnewsletter.idnewsletter = '92' and sentnewsletter.downloaded = '1'
    AND accesscategorylink.userid = sentnewsletter.iduser)

you may get the same query plan ...

You probably need to improve the sentnewsletter table.

Make sure that table has a Primary key and a Clustered Index (usually the PK, but doesn't have to be)

Make sure you have an index which helps with the sentnewsletter query. That probably means an index on:

iduser, idnewsletter, downloaded

check this though:


SELECT [iduser] = COUNT(DISTINCT iduser),
       [idnewsletter] = COUNT(DISTINCT idnewsletter)

If idnewsletter > iduser then it is more "selective", in which case the key order for the index should be switched to

idnewsletter, iduser, downloaded

Kristen

Kristen
Test

22859 Posts

Posted - 2007-10-12 : 05:08:18

"If I run the embeded select that returns one value it also excecutes in less than a second."

Note that that might only be because the data was cached at the time you ran it, rather than that query is well optimised!

Kristen

hrishi_des
Yak Posting Veteran

76 Posts

Posted - 2007-10-12 : 07:16:13

always do this before executing query
DBCC dropcleanbuffers
DBCC freeproccache

always check with set statistics io on
and check logical reads

When solution is simple, God is answering….

Kristen
Test

22859 Posts

Posted - 2007-10-12 : 07:48:46

"always do this before executing query
DBCC dropcleanbuffers
DBCC freeproccache"

... but probably not such a good idea if you are on a Production server!

ias0nas
Starting Member

36 Posts

Posted - 2007-10-15 : 07:49:04

Thank you all for your replies

I had a primary key so I added an extra index on the columns you suggested.
For now the query still times-out, but does it need any time to recustruct the files..?
Should I see the diference right away?

Thank you

Kristen
Test

22859 Posts

Posted - 2007-10-15 : 08:35:07

"I added an extra index on the columns you suggested"

Is that query using the new index? If not I recommend you Drop the new index so that it doesn't clutter things up, and try a different column combination for an index.

Did you run the COUNT() query to check that UserID is sufficiently "selective"?

Yes, you would see any performance improvment immedaitely.

Kristen

ias0nas
Starting Member

36 Posts

Posted - 2007-10-16 : 08:13:08

Yes, I did run the count query

How do I make sure the query is using the index? Should't it use it by itself?

Thank you

Kristen
Test

22859 Posts

Posted - 2007-10-16 : 08:17:36

"Should't it use it by itself?"

Yup, that's the idea, but if SQL Server decides the index won't speed the query it won't use it.

In which case you would then need to make an index which has better "selectivity" for the query.

Kristen

ias0nas
Starting Member

36 Posts

Posted - 2007-10-16 : 08:31:38

Uh, ok

One more question
The table has 3.5 million records and will have a lot more in the future. Is this an issue?

Thank you

ias0nas
Starting Member

36 Posts

Posted - 2007-10-16 : 08:59:03

Creating an index with just the idnewsletter (sorted descending), downloaded did the trick!
Is the 3.5 million records an issue..?

Thank you all.

Kristen
Test

22859 Posts

Posted - 2007-10-16 : 10:07:23

I reckon you need the [iduser] in there too for queries that can be helped via the index. You'll only know if its useful by checking the query plans though (i.e. you may not be able to time the improvement because it may be too variable in real world tests - but if it is used it will be saving some percentage-points, which will all add up over time).

"Is the 3.5 million records an issue..?"

No, but only if your queries are carefully optimised - which may in practice mean that you mean someone to audit your system and advise you on what to tweak.

Kristen

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources