Index optimization question - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Index optimization question

Author

Topic

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-24 : 12:35:22

Hey guys, if you have a query like:

SELECT * FROM MyTable WHERE LastName = @LastName AND FirstName = @FirstName

which index would work and be better for this? ...

(1) A single index with the key Lastname, FirstName

- or -

(2) Two indexes, one for LastName and one for FirstName

What are the pro's and con's of each?

Thanks!

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-03-24 : 12:42:42

if you consider query alone a covering index like 1 will be more benefitial.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-03-24 : 12:55:22

Do you really need to do a SELECT * ?

If you did and there are many columns then it won't really matter as you'll have to do a primary key lookup anyway after searching the index.

Better to only return what you really need and to INCLUDE the columns in the INDEX.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Kristen
Test

22859 Posts

Posted - 2010-03-24 : 13:36:06

" which index would work and be better for this? "

Well, I'll take option (3)

As the query stands an index on LastName or FirstName, or a compound index on both will be used (and so, probably, will both indexes if there is one on each separate column)

An index will only be used if it is "selective" - i.e. most of the entries are unique; this selectiveness-test is only performed on the first key column in the index.

Thus the choice of a compound index "LastName, FirstName" or "FirstName, LastName" should be made on the basis of whether Lastname is more selective than FirstName (if YES then use the first, if NO then use the second definition)

Also, if there are other columns in the WHERE clause (or JOIN conditions) then including them in the index will "cover" the WHERE / JOIN clause

So if you had


SELECT * FROM MyTable WHERE LastName = @LastName AND FirstName = @FirstName AND IsActive=1

and assuming that IsActive only has values 0 and 1, and they are evenly split in the database (i.e. NOT selective at all!) then adding IsActive to the END of the index key list will "cover" that test too - i.e. once First and Last Name have been found in the index then IsActive can also be determined from the index.

Lastly, as TC has said, if the columns in the SELECT clause are also included in the index then the index will cover the SELECT too - so SQL will never have to go to the actual table. IME it is much more rare to have an Index that covers the SELECT clause - except where a list of (say) IDs is being retrieved to then perform some further action.

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-24 : 16:39:37

Thanks everyone for your replies! You have given me more insight on how indexes work.

I would now like to go into deeper details. If you have a compound index like (LastName, FirstName), does SQL Server perform a binary seek only on the first column (LastName), and once a matching LastName is found, perform a sequential scan (either forward or backward) within the group of index rows that have the matching LastName to find the ones that have a matching FirstName? I would imagine that to be slow. If, instead, SQL Server does a binary seek on the compound key (LastName, FirstName), it would be much faster. Any thoughts?

Kristen
Test

22859 Posts

Posted - 2010-03-24 : 17:12:55

If you have an equals test, or a "range test" (LastName >= 'AAA' AND LastName <= 'Nzzz') then SQL does a Range on the index - finds the first item where LastName matches the crtieria, and then walks the index sequentially until LastName no longer matches the criteria

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-24 : 21:07:32

What if you're searching LastName = 'Williams' AND FirstName = 'Victor'? Will SQL Server do a binary seek for LastName = 'Williams' only, then once that's found, scan the index sequentially to find FirstName = 'Victor'? For example, let's say that the index has 70,000 people having a LastName of 'Williams':

1. Williams, Aaron
2. Williams, Aaron
3. Williams, Abe
:
:
60003. Williams, Victor
60004. Williams, Victor
60005. Williams, Victor
:
:
69999. Williams, Zed
70000. Williams, Ziggy

Let's say that SQL Server does a binary seek for LastName = 'Williams' only, and doing so, it happens to land on the index row containing Williams, Abe. Then from that point it will have to scan sequentially through 60,000 rows in the index before it finds the first occurrence of Williams, Victor! That would be very slow, right? Does SQL Server really do that?

Knowing that a binary seek is sooo very much faster than a sequential scan, can't SQL Server optimize the query by doing a binary seek for the combination instead (Williams, Victor)? By doing so, it will get to one of the Williams, Victor entries immediately (courtesy of the very fast binary seek algorithm). Then from that point it can scan sequentially the few nearby rows to finish up finding all the Williams, Victor entries. That would be so much faster since the sequential scan will now be on very few rows only!

I would really like to get confirmation on whether SQL Server can and will perform a binary seek on the entire compound key instead of the first column only, when optimizing any query like this that could make good use of a compound index.

Kristen
Test

22859 Posts

Posted - 2010-03-25 : 04:30:32

It will seek on all keys (and then if the index is NOT unique will sequentially walk to get the other values).

Think about a multi-part Primary Key - SQL has to be efficient on that, so needs to go directly to the specific entry.

The only thing the first key fields is used for (in this regard) is for the Statistics of how Selective the index will be - this is used by the Query Planner to decide whether Index A is better to use than Index B

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-25 : 10:08:54

quote:
Originally posted by mtl777

What if you're searching LastName = 'Williams' AND FirstName = 'Victor'? Will SQL Server do a binary seek for LastName = 'Williams' only, then once that's found, scan the index sequentially to find FirstName = 'Victor'?

No. It will seek down the b-tree to the first instance of 'Williams', 'Victor' and then read sequentially until it has found all of the 'Williams', 'Victor' records.

http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

--
Gail Shaw
SQL Server MVP

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-25 : 12:24:37

Thank you so much, guys! That's good to know. And thanks, Gail, for your very informative article!

Now let's take a look at the second indexing option in my original question -- that of having two separate indexes, one for LastName and one for FirstName. How exactly would SQL Server use these indexes to search for LastName = 'Williams' AND FirstName = 'Victor' in the most efficient manner possible? I would imagine that it might seek the LastName index for Williams, but when it finds Williams in the index, the entry found may not correspond to Victor in the table. And it also might seek the FirstName index for Victor, but when it finds Victor in the index, it may not correspond to Williams in the table. So how would SQL Server be able to "glue" these two together to find the matching Williams, Victor entries?

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-25 : 14:04:26

Generally, in the vast majority of cases, SQL will use one of those indexes, whichever it thinks is the most selective. Let's say for the purposes of the discussion that the LastName is more selective. It will do a seek on the LastName index, fetch all the rows for 'Williams', then do a lookup to the cluster/heap for the rest of the columns required. Once it has the FirstName column it will then filter out rows that don't have the FirstName = 'Victor'
That's assuming that the index is selective enough for the lookups not to be too expensive. If they are, SQL will ignore both indexes and do a table scan.

That's the usual case. It is sometimes possible for SQL to do an index intersection. This is not a common choice.

An index intersection involves seeking on the LastName index, fetching all the index entries that match LastName = 'Williams'. Then SQL does a seek on the FirstName index, fetches all the index entries that match FirstName = 'Victor'. Once it has those two resultsets, it does a join to get the rows that are present in both.

--
Gail Shaw
SQL Server MVP

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-25 : 15:42:09

Thank you so much! That is great info!

Based on that, I would conclude that the compound index (LastName, FirstName) is better for the query LastName = 'Williams' AND FirstName = 'Victor' than two separate indexes.

Just a couple more questions while we're at it

: Can you also use a compound index as a covering index? For example, (LastName, FirstName, PhoneNum, IsLiving, IsVegetarian), where IsLiving and IsVegetarian are bit flags (1 = Yes, 0 = No). So if you have a query like, LastName = 'Jones' AND IsLiving = 1, will SQL Server seek this index for LastName = 'Jones' then scan the index for IsLiving = 1 (within the 'Jones' values), i.e., not have to go to the table to check the IsLiving value? Or how about for a simple query like, PhoneNum = '2137779999', will SQL Server scan this index instead of the table? I ask because scanning the index would be faster than scanning the table if the table is large and has many columns.

Thanks again!

Kristen
Test

22859 Posts

Posted - 2010-03-25 : 16:37:19

"I would conclude that the compound index is better"

Probably better that you consider how well the index "covers" the query, rather than a compound index being better, per se. SQL can. sometimes, use the intersection of two indexes, but a covering index is far more useful - only one index seek and you have all (or most of) the data to resolve a JOIN or WHERE clause

"LastName = 'Jones' AND IsLiving = 1, will SQL Server seek this index for LastName = 'Jones' then scan the index for IsLiving = 1 (within the 'Jones' values)"

In general yes. (Depends on how selective the index is for "LastName = 'Jones'")

I don't think you can have a BIT column in an index - although i think that changed somewhere - SQL2008 maybe?

Definitely no point having the BIT column as the FIRST field in the index (unless it has 99% of the values one way, and then only worth using the other 1% and thus a FILTERED INDEX (SQL2008) would be best - it will only store the 1% and not bother with the 99%

)

"for a simple query like, PhoneNum = '2137779999', will SQL Server scan this index instead of the table?"

In general NO, a table scan (or a different index, if there are other criteria in the WHERE clause) will be faster

"I ask because scanning the index would be faster than scanning the table if the table is large and has many columns."

Nope ... scanning the whole of the index, and then doing a lookup on the main table to get the other data columns is a huge amount of disk head travel, blocks read into memory, cache stuff (that might be useful) flushed for low-hit pages, and so on.

I don't think SQL has any statistics for how useful PhoneNum, within the compound-index, would be - and thus could not evaluate roughly how many hits there would be doing a sequential read of the whole index.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-03-25 : 16:48:49

quote:
Originally posted by Kristen

I don't think you can have a BIT column in an index - although i think that changed somewhere - SQL2008 maybe?

Sure you can. There is a missconception you can't because enterpirse manager didn't have an option for doing it or something.

Kristen
Test

22859 Posts

Posted - 2010-03-25 : 17:34:10

Well I never ... I can even create a BIT index on SQL2000.

SQL 2000 Enterprise Manager does NOT show the BIT columns via:

Table : Design : Properties : Indexes/Keys

so you are spot on

if you right click the table and do MANAGE INDEXES then you can create an index with a BIT column, no problems.

Well-I-never!!

mtl777
Yak Posting Veteran

63 Posts

Posted - 2010-03-25 : 19:24:05

quote:
Originally posted by Kristen

"for a simple query like, PhoneNum = '2137779999', will SQL Server scan this index instead of the table?"

In general NO, a table scan (or a different index, if there are other criteria in the WHERE clause) will be faster

"I ask because scanning the index would be faster than scanning the table if the table is large and has many columns."

Nope ... scanning the whole of the index, and then doing a lookup on the main table to get the other data columns is a huge amount of disk head travel, blocks read into memory, cache stuff (that might be useful) flushed for low-hit pages, and so on.

I don't know what I'm missing but scanning a table seems much slower to me. For example, let's say that the table has a row size of 10,000 bytes and the PhoneNum of 2137779999 is found at row # 100,000. That would mean having to read 1,000,000,000 bytes from the table (100,000 rows of 10,000 bytes each) just to find that phone #. On the other hand, if the index has a row size of 300 bytes, the index scan would need to read only 30,000,000 bytes from the index (100,000 rows of 300 bytes each) to find the phone #. And once it's found, getting to the desired row in the table is a quick and simple matter since the index has a pointer to the row. It's a quick random access, not a sequential access, for the hard drive to get to the remaining data in the table once the row pointer is known.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-26 : 04:32:40

quote:
Originally posted by Kristen
"for a simple query like, PhoneNum = '2137779999', will SQL Server scan this index instead of the table?"

In general NO, a table scan (or a different index, if there are other criteria in the WHERE clause) will be faster

A table scan will not be faster than an index scan. Indexes are smaller than the table (unless someone's been silly and included every single column), hence incur less IOs to scan.

If there's an index that covers the query and has the where clause predicate as the leading column of the index, SQL will seek on that index.
If there's an index that covers the query and the where clause predicate is not the leading column of the index, SQL will likely scan the index.
If the index doesn't covers the query and has the where clause predicate as the leading column of the index, SQL will seek unless the cost of the bookmark lookups are too expensive
If the index doesn't covers the query and the where clause predicate is not the leading column of the index, SQL will likely not use the index, will look for another that covers or go to a table scan.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2010-03-26 : 04:34:55

"I don't know what I'm missing but scanning a table seems much slower to me"

Indeed, I thought that when I started out with SQL, but I presume MS et al have put time and effort in testing it, and it isn't the case.

SQL doesn't know that there is only one entry in the index that matches the Phone Number; you are a human, you know that, but SQL has to make assuptions based on statistics of the tables; SQL doesn't (AFAIK) have statisics for columns within indexes. So it will have to assume that N% match (I don't know what N is, but I believe SQL has stnadard values for assumptions where there are no statistics).

So assuming that, say, 30% of the rows in the index will match then SQL will calculate the time for head travel etc. to look up all those rows in the main table and decide that using the index is slower than a table scan.

Try it - you can force SQL to use that index using an Index HINT, and time it and see what the difference is. If you have found a case where SQL performs badly then that is what the HINT is there for - to encourage (you can't actually "force" SQL to use a hint) SQL to do it a better way.

If you want an index that is on phone numbers that are unique don't bury it in another index. A specific, unique, index on phone number would tell the SQL query analyser very clearly that every row is unique - i.e. an EQUALS match will find, at most, one row, and is thus very efficient.

You are trying to put forward your ideas of how you think it should work, and are surprised at how it actually works. I think you would do better to Google for some articles on the subject and read up on how it actually works, proven over years of testing etc., and learn to optimise your queries accordingly; this will include finding out which indexes work, why, how to interpret Query Plans and use them to decide what columns to include in a new index, why an index is not being used (where you expected it to be), and so on.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-26 : 04:36:22

quote:
Originally posted by mtl777 For example, let's say that the table has a row size of 10,000 bytes and the PhoneNum of 2137779999 is found at row # 100,000. That would mean having to read 1,000,000,000 bytes from the table (100,000 rows of 10,000 bytes each) just to find that phone #.

No, It would mean at least that many reads (and we measure reads in pages, 8kb chunks, not bytes). SQL won't stop when it finds a matching record unless it knows there's uniqueness, which it can only know if there's a unique index/constraint. So if it's scanning an index/table for phone numbers = 2137779999 and there's no unique index/constraint on phone numbers (if there was it wouldn't have to scan), it has to read the entire table/index to make sure that it's found all matches.

--
Gail Shaw
SQL Server MVP

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-26 : 04:39:48

quote:
Originally posted by KristenNope ... scanning the whole of the index, and then doing a lookup on the main table to get the other data columns is a huge amount of disk head travel, blocks read into memory, cache stuff (that might be useful) flushed for low-hit pages, and so on.

Ok, if the index isn;t covering, no way SQL will scan it. If the index is covering, it'll be faster than a table scan

quote:
I don't think SQL has any statistics for how useful PhoneNum, within the compound-index, would be - and thus could not evaluate roughly how many hits there would be doing a sequential read of the whole index.

If auto-create stats is on, the optimiser would request stats for phone number be created so that it does have a rough idea.

--
Gail Shaw
SQL Server MVP

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-03-26 : 04:42:07

quote:
Originally posted by KristenIf you have found a case where SQL performs badly then that is what the HINT is there for - to encourage (you can't actually "force" SQL to use a hint) SQL to do it a better way.

Hints are directives, not suggestions. The optimiser is not allowed to ignore them. If you add a hint with (index = 2), that query MUST used index 2 and the optimiser has no options to chose a different index.
(note that locking hints have a slightly different behaviour)

--
Gail Shaw
SQL Server MVP

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources