Non-clustered indexes - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Administration

Non-clustered indexes

Author

Topic

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-05 : 15:43:19

Defining non-clustered indexes. Do I list all the equality columns from the 'Where' clause first, followed by the inequality columns from the 'Where' clause for the key columns and then list the remaining columns in the SELECT statement for the included columns to create a covering index?

870Remington

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-05 : 18:02:36

The answer is...it depends

Got DDL and the query you need to write?

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-10-05 : 20:28:24

Indexing is an art, and the only way to know for sure what exact index you'll need is to do load testing. DBAs will rarely just deploy an index to an environment without first taking a copy of production to play with, comparing execution plans and statistics between the different variations.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

"Let's begin with the premise that everything you've done up until this point is wrong."

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-06 : 15:25:48

I appreciate the fact that you both took time to reply, however, neither reply is much in the way of helping me. I am not asking for the absolute best possible index for a particular query against a specific table(s), and who said anything about deploying without testing!

I am looking for a starting point from which to begin my testing so that I may arrive at an index that performs well.

I have looked and found that it 'depends' and that it is 'more an art form than a science' ad nauseum - so, I get that.

I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used - and says nothing of the WHERE clause??????

Since I cannot find anyplace that comes out and says so, through deduction I have concluded that a good starting point is to include all of the 'equality columns ' (listed first) and then the 'inequality columns' from the WHERE clause as the 'key columns' and then list the remaining columns listed in the SELECT statement as the 'included columns' - and start my testing with a covering index defined with this model.

I am sure you both meant well and are far better at understanding all this than I am, but for someone trying to further his skills in the 'art' of building covering indexes, the 'starting point' from which to begin the testing process remains elusive.

So, unless there is absolutely no common 'starting point' (such as I stated above) from which to begin the process of building a covering index I shall keep looking.

Thanks again.

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2009-10-06 : 16:13:18

"I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used..."

Don't know where you found that, but it's not true.

As a general rule, the index columns should be in order from most selective to least selective.

CODO ERGO SUM

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-06 : 21:24:22

WOW

I'm honored to be flamed in the same statement with the SQL Warrior Princess

Good Luck Motley

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-07 : 03:27:00

Are you specially interested in creating covering indexes or just indexes in general? Including columns in the nonclustered index will vastly increase it's size so you should be careful about adding too many. But as MVJ says the general rule is to index the most selective columns first (unique columns like id, ssn, guid and such and then maybe phonenumber, lastnames, firstnames,etc) and then include other columns that are not as selective.

You can also look in the books online index for "covering query with index [SQL Server]"

- Lumbago
http://xkcd.com/327/

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-07 : 09:12:43

Sorry that I came across as "flaming" anyone - I am just looking for answers. No hard feelings :)

Re: Mimicking the order of the Index Key columns with the Query...I did get this out of a blog and tested it to confirm it was bad information. WARNING - not everything you read on the internet is true! For those of us on a quest to learn, sometimes it is hard to pick the fly shit out of the pepper!

Re: BOL for covering indexes suffers from the same lack of "when, why, or how" type information, but is great for definitions.

It IS covering indexes that I am focused. I have some tables that are very receptive to the added 'costs' of a covering index and therefore, I want to pursue this.

I will try to articulate my statements/questions and sprinkle in some of the results of my testing and my confusion will become apparent.

First my disclaimer - I am new to the art of indexing and some of my 'understandings' may be incorrect and therefore will appreciate any corrections. I ask that anyone providing information be sure of what they are saying so no misinformation leads me down a dead end.

I am working on the Standard Edition of SQL Server 2005. All of my tables have a Clustered Index and a Primary Key.

I created a non-clustered index listing two columns as equality columns, two columns as inequality columns and then several included columns. This index was created for testing, it was not created around a particular query. The create statement looked something like this using generic code:

CREATE NONCLUSTERED INDEX IX_tableABC_columnAequality_columnBequality_columnCinequality_columnDinequality_Incl_columns
ON tableABC([columnA], [columnB], [columnC], [columnD])
INCLUDE ([columnE], [columnF], [columnG], [columnH], [columnI], [columnJ], [columnK], [columnL] )

This model was demonstrated when looking at documentation around using the DMV for missing indexes, listing the equality columns first, followed by the inequality columns in the key columns.

I then constructed simple queries making columns A & B "equal" to specific values and columns C & D either as "IS NULL" or "not equal" to certain values; and then asked for some or all of the included columns. These queries always used this index.

If I deviated from using the equality columns for anything other than "equals' to OR used any of the inequality columns to be "equal" to something OR used any of the "included" columns in the WHERE clause - this index was never used. None of this was a surprise; therefore, it performed exactly as I would have expected.

Here is where I will try to explain my confusion...

It appears that the delineation between equality and inequality columns in the index create statement is merely listing the equality columns first (but alphabetically) and then the inequality columns (again, alphabetically). If this is true then it negates the opportunity to list the 'more selective' columns first as is the recommended practice.

It is often the case, however, that the columns listed in the WHERE clause are NOT very selective.

I have entered into this forum early into my testing process and continue testing as time permits. IF what I have said is true, problems such as inequality columns that would naturally follow the alphabetical order of the last equality column name would be miscontrued my SQL as an equality column and therefore SQL would not use the index when the column in question would be used as an inequality column in a WHERE clause.

I will be testing this as soon as I can. If anyone out there has done this type of testing I would be very interested to piggyback on your knowledge; or in the likely case that I have totally missed something important here - I would like to know that too!

I hope everyone is able to understand what I have written and maybe shed some light on this. Thanks for reading and especially to those who reply.

Larry

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-07 : 09:42:28

you need to understand how indexes work.

I find it immensely amusing that you are so concerned right now about covering indexes

While they have their place...getting to the data is priority #1

The thinness of an index is also important

You also seem to be ignoring, or unaware, of the impact of DML operations may have...unless this is a pure reportng database, or warehouse (damn buzz words)

I take it you actually have no concrete examples you'd like to provide/ work with....and this exercise is merely axcadmeic

If not then post some real structures....and identify the volume of data

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-07 : 10:16:01

I getting a little confused about what you're actually asking here, so please forgive me if my answers are not accurate but I have the feeling that you're misunderstanding the (in)significance of the INCLUDE part of the index.

The columns that form a part of the actual index and that are considered when running a query are the columns A, B, C and D in your example, *in that order*. I'll create a simple example with a 3 column table and 2 columns in the index:

CREATE TABLE mytable (ColA int, ColB int, ColC int)
CREATE NONCLUSTERED INDEX ix_mytable ON mytable (ColA, ColB) INCLUDE (ColC)
INERT INTO mytable
SELECT 1, 1, 54 UNION ALL SELECT 2, 2, 777 UNION ALL SELECT 3, 3, 12 UNION ALL
SELECT 4, 1, 9845 UNION ALL SELECT 5, 2, 44 UNION ALL SELECT 6, 3, 847

This table looks like this

ColA     ColB    ColC
1        1       54
2        2       777
3        3       12
4        1       9845
5        2       44
6        3       847

Now rewiew the execution plans for the following queries:

select * from mytable where ColA = 4
select * from mytable where ColA <> 4
select * from mytable where ColB = 2
select * from mytable where ColB <> 2
select * from mytable where ColA = 4 AND ColC = 846
select * from mytable where ColA = 4 AND ColC <> 846
select * from mytable where ColB = 2 AND ColC = 846
select * from mytable where ColB <> 2 AND ColC = 846
select * from mytable where ColC = 846
select * from mytable where ColC <> 846

Take special note of the queries where ColB = 2...this will give you an index scan because ColB is listed second in the index definition and all the data in ColB are not ordered. All queries involving ColC perform exactly the same because ColC is not a part of the index but only included. The last 2 queries would have given you a table scan if ColC was not included but filtering only by included columns will always give you an index scan.

- Lumbago
http://xkcd.com/327/

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2009-10-07 : 10:26:03

quote:
Originally posted by Larry Van Brocklin

I have looked and found that it 'depends' and that it is 'more an art form than a science' ad nauseum - so, I get that.

Well, it does depend...

quote:
I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used

Total garbage.

quote:
Since I cannot find anyplace that comes out and says so, through deduction I have concluded that a good starting point is to include all of the 'equality columns ' (listed first) and then the 'inequality columns' from the WHERE clause as the 'key columns' and then list the remaining columns listed in the SELECT statement as the 'included columns' - and start my testing with a covering index defined with this model.

Maybe. The ordering is the one that will work best. The risk here is very big indexes and redundant indexes. It's not always desirable to create the absolute best index for every query. That's where the 'art form' comes from.

These may help
http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

--
Gail Shaw
SQL Server MVP

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-07 : 10:34:10

Brett,

I am glad that you find it “immensely amusing” that I have questions about something I would like to learn more about; and you are right – I do need to understand how indexes work – I thought that is what I was doing!

The example table that I am testing is heavily used in reporting and could be (at some point in time) become a source table for a warehouse, although it is not now. The data trickles in (no huge bulk inserts or deletes) and may (or may not) get modified for a period, but then stays just stays there for reporting purposes. Therefore, the impact of DML is minimal – you know I thought I mentioned before that the table was receptive to the added ‘costs’ of a covering index!

I could give ‘concrete’ examples but chose not to because this IS an academic exercise and I am not looking for specific answers to a specific problem, but rather asking broader questions about the subject of covering indexes. Are academic questions a bad thing?

Now that we have established that you have an innate talent for stating the obvious and a propensity to demean me, I have failed to see where you answered any of my questions.

...or did I miss something here?

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-07 : 11:08:24

Lumbago,

I understand what you sent and I appreciate your efforts to help me.

I must not have been clear on one point - equality and inequality columns as defined in the create index statement.

Maybe I am having a fundamental problem with the understanding of equality and inequality columns as Key Columns as defined in the create index statement. So far, no one on the forum is addressing it and this is a big part of my question.

I am going to redo my testing using my testing index and verify what I thought I was seeing is true.

Thanks,

Larry

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-07 : 11:14:06

Where did you even hear abouit covering indexes in the first place?

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2009-10-07 : 11:48:25

quote:
Originally posted by Larry Van Brocklin

Maybe I am having a fundamental problem with the understanding of equality and inequality columns as Key Columns as defined in the create index statement. So far, no one on the forum is addressing it and this is a big part of my question.

Did you perhaps read the two blog posts I referred you to?

--
Gail Shaw
SQL Server MVP

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-07 : 11:56:25

http://www.google.com/webhp?hl=en#hl=en&source=hp&q=sql+server+indexing+best+practices&aq=0&aqi=g1&oq=sql+server+indexing+best+pr&fp=2cca7b2e99206b9c

http://www.mssqltips.com/tip.asp?tip=1452

http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1239952,00.html

http://demiliani.com/blog/archive/2007/02/16/5058.aspx

http://www.extremeexperts.com/SQL/Articles/BestPractices.aspx

http://sqlserverpedia.com/blog/sql-server-bloggers/my-top-5-sql-server-indexing-best-practices/

http://www.microsoft.com/emea/spotlight/sessionh.aspx?videoid=29

http://technet.microsoft.com/en-us/library/cc966523.aspx

http://technet.microsoft.com/en-us/library/cc966523.aspx

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-07 : 12:09:35

In the beginning.....

http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

http://en.wikipedia.org/wiki/Edgar_F._Codd

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Larry Van Brocklin
Starting Member

8 Posts

Posted - 2009-10-07 : 12:14:06

Hey Gail,

I did go out to your suggested sites and recognized them as sites I had already visited.

In the meantime I went back to my testing index on my test database and performed some more tests and found that I had made a mistake that changes a lot.

In previous tests I mistakenly determined that when using an index where I had created equality and inequality columns and then ran a query where a column originally intended to be (for instance) an inequality column, and then used as an equality in the WHERE clause that it would not use that index. This did not make sense to me and so I was on a quest to find out how SQL would know the difference - when in fact - it does not.

A key column is a key column, and key columns should be ordered by selectivity.

Consequently, with that bit of misunderstanding cleared up then the ability to define the order of the more selective columns became clear again to me.

Test, retest, and make sure you correctly interpret the output - this is where I failed. Lesson learned.

Thanks to you and everyone else who replied.

Larry

X002548
Not Just a Number

15586 Posts

Posted - 2009-10-07 : 13:11:25

hey...
why don't you post the table DDL, with it's indexes and the query you were running

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2009-10-07 : 17:37:39

quote:
Originally posted by Larry Van Brocklin

A key column is a key column, and key columns should be ordered by selectivity.

Not necessarily.

If 60% of queries on a table filter by ColA and ColB and the remaining 40% filter only by ColB, then it's not very useful to make the index ColA, ColB even if ColA is highly selective and ColB is less selective.

Selectivity has to be taken into account, but it's far from the only thing. That's what I was trying to get at in those posts.

If all the queries filter by ColA, ColB, ColC then sure, put the most selective one first. If all queries filter by ColC, most by ColB and ColC and the rest by ColA, ColB and ColC, then putting ColA as the leading column is senseless, doing so will ensure that most queries will have to scan rather than seek.

--
Gail Shaw
SQL Server MVP

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources