Indexing question - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Indexing question

Author

Topic

bert_r
Starting Member

3 Posts

Posted - 2010-03-08 : 06:31:11

Hello,

I've got a simple SELECT query with a large WHERE clause which retrieves a large number of fields (almost all fields in the table) for on record:
[CODE]SELECT TOP 1 ID,field1,field2,... FROM Table WHERE FieldA=123 AND FieldB=456 ... ORDER BY FieldZ[/CODE]
I'm not posing the actual query because I'm not allowed to, but I don't think it's really relevant.

Because this query is used a lot I'm trying to gain as much speed as possible. I tried adding an index which contains the ID and all the fields used in the WHERE and ORDER BY clause (but not in the SELECT clause), thinking this would improve performance a lot ... but it didn't. If I look at the execution plan I can see the index is not used, instead I see a 'Key lookup' which takes most of the time.

I then split the query so I only retrieve an ID field, and a second query which retrieves the data:
[CODE]declare @id int;
SELECT TOP 1 @id=ID FROM Table WHERE FieldA=123 AND FieldB=456 ... ORDER BY FieldZ;
SELECT TOP 1 ID,field1,field2,... FROM Table WHERE ID=@id;[/CODE]
The first select query uses an index seek using the index (which is now a covering index), the second query uses an index seek of the primary key.

When using the execution plans to compare the performance the first query uses 60%, while the second query (+3rd) uses 40%. I double-checked this with SQL profiler and indeed, splitting the query halves the number of reads (and the query duration).

I just don't understand why this is the case. Isn't SQL Server smart enough to use the index which covers all fields in the WHERE clause? Can I change the index so splitting the query isn't necessary? Or should I just go ahead and split the query?

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-08 : 07:13:15

The problem will probably be in your WHERE clause.

Are you using any wrapper functions around the columns (for example LIKE '%..', DATEDIFF(), etC)

if so then you won't be able to use an index on that column and I guess what's happening is the query plan must perform a clustered index lookup (key lookup) instead.

Can you at last post the whole text of the WHERE clause?

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

bert_r
Starting Member

3 Posts

Posted - 2010-03-08 : 08:50:50

The only wrapper function used is ISNULL(). Could that be an issue?

Here's the where clause:

	SELECT TOP 1 _ID,...
	FROM Table
	WHERE
	(
		(_Category IN (select Category from @AllowedCategories))
		AND ( @P9=1 OR _SkillCategory IN (select Category from @SkillCategories))
		AND (_Mode IS NULL OR _Mode IN (0,1))
		AND (_LastResult IS NULL OR _LastResult<>0)
		AND (_TimeAvailable IS NULL OR _TimeAvailable<=GetDate())
		AND (_WorkingPriority>0)
		AND (_ID NOT IN (select id from @BlockedIDs))
		AND (ISNULL(_AppAgentId,0)<=0 OR _AppAgentId=@P10 OR (ISNULL(_CallNow,0)<>1 AND ISNULL(_AppForAgentOnly,0)<>1 AND _AppAgentId NOT IN (SELECT AgentId FROM AgentsLoggedOn)))
	) ORDER BY _WorkingPriority DESC;

Note that @AllowedCategories, @SkillCategories and @BlockedIDs are table variables created before this query.

After looking a bit better at the execution plan I see it actually uses both an index seek of the index and a key lookup of the primary key. The index seek is 18%, the key lookup 43%.

But when I split the query I use the exact same WHERE clause. Shouldn't the same thing occur as in the first query, which only SELECT's a few more columns?

EDIT: When I remove the ORDER BY clause both the index seek and the key lookup are replaced by a clustered index scan. But again, why is it different in the second query (reduced SELECT)? In that case a nonclustered index seek is used, regardless of the ORDER BY clause.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-08 : 09:21:09

ANY function on a column prevents index usage, any at all. That include ISNULL.
In addition, the query form that you're using there (multiple @var IS NULL OR Col = @Var) does not perform well and will lead to very poor index usage.
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

--
Gail Shaw
SQL Server MVP

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-08 : 10:25:42

In addition to Gail's post you wrote:


AND ( @P9=1 OR _SkillCategory IN (select Category from @SkillCategories))

AND (_ID NOT IN (select id from @BlockedIDs))

... NOT IN (SELECT AgentId FROM AgentsLoggedOn)

Each of those would be better implemented using EXISTS or NOT EXISTS

For example


	SELECT TOP 1 _ID,...
	FROM Table
	WHERE
	(
		(_Category IN (select Category from @AllowedCategories))
		AND ...

Would be better as


	SELECT TOP 1 _ID,...
	FROM Table t
	WHERE
	(
		EXISTS (SELECT 1 FROM @allowedCategories ac WHERE ac.[Category] = t.[_Category])
		AND ..

Edited after discussion
EXISTS / NOT EXISTS may be faster than scanning an IN list. See here for more info and testing:

http://weblogs.sqlteam.com/peterl/archive/2009/06/12/Timings-of-different-techniques-for-finding-missing-records.aspx

See Gail's post below for more in-depth testing on recent environments
Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-08 : 11:02:12

Not always.
http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

I did some, admittedly simple, tests and when the columns involved are not nullable, I had identical performance for NOT IN and NOT EXISTS. While I only show one execution in the blog post, I did several and all were just about the same. This is SQL 2008 SP1.

--
Gail Shaw
SQL Server MVP

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-08 : 11:24:53

I think in this case (and I'm guessing from the amount of ISNULL wrappers) that the data won't have been declared as NOT NULL. I think that EXISTS / NOT EXISTS will be faster in that environment.

Maybe since last I looked at it things have changed with 2008. My shop is almost always a release behind though so I'm stuck with 2005 for now.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-08 : 11:26:32

I reproduced Peter's test (as best I can) on my 2008 server, and the screenshot below shows the results I had. Near-identical performance from the two. NOT IN averaged at 324ms of CPU time, NOT EXISTS at 332ms. That's small enough to be random interference from other apps.

http://sqlinthewild.co.za/wp-content/uploads/2010/03/InVsExists.png

--
Gail Shaw
SQL Server MVP

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-08 : 11:27:49

quote:
Originally posted by Transact CharlieI think that EXISTS / NOT EXISTS will be faster in that environment.

Faster, yes, with completely different results if there are any NULLs present in the referenced column in the subquery.

--
Gail Shaw
SQL Server MVP

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-08 : 11:31:25

quote:

Faster, yes, with completely different results if there are any NULLs present in the referenced column in the subquery.

Ah -- the quick n dirty way of performance tuning -- throw away half the results!

bert_r -- listen to Gail's advice re query formation. the whole NOT EXISTS / NOT IN thing is just nitpicking but you'll see a vast increase in performance if you change the way your query works. Either with dynamic sql or some other way.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

bert_r
Starting Member

3 Posts

Posted - 2010-03-08 : 11:52:36

Huh ... I never knew that 'x IS NULL OR 'x=123' would be inefficient. Originally it was ISNULL(x,123)=123, but I changed that after reading that IS NULL OR would be better. The other ISNULL()'s were added later - I was planning to replace these with 'IS NULL OR' ... it seems I have a lot of work to do!

Note that a lot of the fields checked with IS NULL will in fact be NULL. Most fields (_LastResult, _Mode, ...) are NULL until the first use of the record. Changing the scheme is extremely difficult, so I have to check for NULLs.

EDIT (after reading the linked article thoroughly): If I understand correctly a condition with '@x IS NULL OR @x=field' gives bad performance. In my query I'm not checking parameters though, I' mostly using constants: '_LastResult IS NULL OR _LastResult<>0'. Wouldn't that behave differently?

One thing I still don't understand: regardless of what the WHERE clause is, why does SELECT ID FROM [...] use the index, while SELECT ID,field1,field2,[...] FROM [...] doesn't? If the WHERE clause is identical, shouldn't the query plan be almost identical?

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-08 : 12:14:56

can be down to query plan reuse. Because you've got a general purpose query then the first time it runs you'll generate a plan that might be good for that particular instance (you passed it x variables...) because you don't change the text of your plan (plan are reused on a hash of the sql text -- or at least they are in 2005 --don't know about 2008) then the same query plan will be reused again even though this time you have passed it y paramaters.

If you change the sql text you'll get a new plan -- this one uses the index.

That's my guess anyway.

Gails link:
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

is a good jumping off point for more info.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-08 : 12:25:06

quote:
Originally posted by bert_r
If I understand correctly a condition with '@x IS NULL OR @x=field' gives bad performance. In my query I'm not checking parameters though, I' mostly using constants: '_LastResult IS NULL OR _LastResult<>0'. Wouldn't that behave differently?

Maybe. Here you're going to have to do some testing yourself, see how the query behaves, how the plans look. It won't be as bad as the comparisons with constants (which really generate very bad execution plans), may still be problematic. Test and see.

quote:
One thing I still don't understand: regardless of what the WHERE clause is, why does SELECT ID FROM [...] use the index, while SELECT ID,field1,field2,[...] FROM [...] doesn't?

Because in the first case the index is covering, contains all the columns necessary for the query so SQL doesn't have to go and look up to the cluster or heap for the missing columns. Lookups are expensive operations, SQL doesn't like doing lots of them.

Forgive me for giving you another blog link, but this should explain things in more detail.
http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/

--
Gail Shaw
SQL Server MVP

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources