Author |
Topic |
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-05 : 15:43:19
|
Defining non-clustered indexes. Do I list all the equality columns from the 'Where' clause first, followed by the inequality columns from the 'Where' clause for the key columns and then list the remaining columns in the SELECT statement for the included columns to create a covering index?870Remington |
|
X002548
Not Just a Number
15586 Posts |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2009-10-05 : 20:28:24
|
Indexing is an art, and the only way to know for sure what exact index you'll need is to do load testing. DBAs will rarely just deploy an index to an environment without first taking a copy of production to play with, comparing execution plans and statistics between the different variations.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong." |
 |
|
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-06 : 15:25:48
|
I appreciate the fact that you both took time to reply, however, neither reply is much in the way of helping me. I am not asking for the absolute best possible index for a particular query against a specific table(s), and who said anything about deploying without testing!I am looking for a starting point from which to begin my testing so that I may arrive at an index that performs well.I have looked and found that it 'depends' and that it is 'more an art form than a science' ad nauseum - so, I get that. I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used - and says nothing of the WHERE clause??????Since I cannot find anyplace that comes out and says so, through deduction I have concluded that a good starting point is to include all of the 'equality columns ' (listed first) and then the 'inequality columns' from the WHERE clause as the 'key columns' and then list the remaining columns listed in the SELECT statement as the 'included columns' - and start my testing with a covering index defined with this model.I am sure you both meant well and are far better at understanding all this than I am, but for someone trying to further his skills in the 'art' of building covering indexes, the 'starting point' from which to begin the testing process remains elusive. So, unless there is absolutely no common 'starting point' (such as I stated above) from which to begin the process of building a covering index I shall keep looking.Thanks again. |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2009-10-06 : 16:13:18
|
"I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used..."Don't know where you found that, but it's not true.As a general rule, the index columns should be in order from most selective to least selective.CODO ERGO SUM |
 |
|
X002548
Not Just a Number
15586 Posts |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-07 : 03:27:00
|
Are you specially interested in creating covering indexes or just indexes in general? Including columns in the nonclustered index will vastly increase it's size so you should be careful about adding too many. But as MVJ says the general rule is to index the most selective columns first (unique columns like id, ssn, guid and such and then maybe phonenumber, lastnames, firstnames,etc) and then include other columns that are not as selective. You can also look in the books online index for "covering query with index [SQL Server]"- Lumbagohttp://xkcd.com/327/ |
 |
|
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-07 : 09:12:43
|
Sorry that I came across as "flaming" anyone - I am just looking for answers. No hard feelings :)Re: Mimicking the order of the Index Key columns with the Query...I did get this out of a blog and tested it to confirm it was bad information. WARNING - not everything you read on the internet is true! For those of us on a quest to learn, sometimes it is hard to pick the fly shit out of the pepper!Re: BOL for covering indexes suffers from the same lack of "when, why, or how" type information, but is great for definitions.It IS covering indexes that I am focused. I have some tables that are very receptive to the added 'costs' of a covering index and therefore, I want to pursue this.I will try to articulate my statements/questions and sprinkle in some of the results of my testing and my confusion will become apparent.First my disclaimer - I am new to the art of indexing and some of my 'understandings' may be incorrect and therefore will appreciate any corrections. I ask that anyone providing information be sure of what they are saying so no misinformation leads me down a dead end.I am working on the Standard Edition of SQL Server 2005. All of my tables have a Clustered Index and a Primary Key.I created a non-clustered index listing two columns as equality columns, two columns as inequality columns and then several included columns. This index was created for testing, it was not created around a particular query. The create statement looked something like this using generic code:CREATE NONCLUSTERED INDEX IX_tableABC_columnAequality_columnBequality_columnCinequality_columnDinequality_Incl_columns ON tableABC([columnA], [columnB], [columnC], [columnD]) INCLUDE ([columnE], [columnF], [columnG], [columnH], [columnI], [columnJ], [columnK], [columnL] )This model was demonstrated when looking at documentation around using the DMV for missing indexes, listing the equality columns first, followed by the inequality columns in the key columns.I then constructed simple queries making columns A & B "equal" to specific values and columns C & D either as "IS NULL" or "not equal" to certain values; and then asked for some or all of the included columns. These queries always used this index. If I deviated from using the equality columns for anything other than "equals' to OR used any of the inequality columns to be "equal" to something OR used any of the "included" columns in the WHERE clause - this index was never used. None of this was a surprise; therefore, it performed exactly as I would have expected.Here is where I will try to explain my confusion...It appears that the delineation between equality and inequality columns in the index create statement is merely listing the equality columns first (but alphabetically) and then the inequality columns (again, alphabetically). If this is true then it negates the opportunity to list the 'more selective' columns first as is the recommended practice.It is often the case, however, that the columns listed in the WHERE clause are NOT very selective.I have entered into this forum early into my testing process and continue testing as time permits. IF what I have said is true, problems such as inequality columns that would naturally follow the alphabetical order of the last equality column name would be miscontrued my SQL as an equality column and therefore SQL would not use the index when the column in question would be used as an inequality column in a WHERE clause.I will be testing this as soon as I can. If anyone out there has done this type of testing I would be very interested to piggyback on your knowledge; or in the likely case that I have totally missed something important here - I would like to know that too!I hope everyone is able to understand what I have written and maybe shed some light on this. Thanks for reading and especially to those who reply.Larry |
 |
|
X002548
Not Just a Number
15586 Posts |
Posted - 2009-10-07 : 09:42:28
|
you need to understand how indexes work.I find it immensely amusing that you are so concerned right now about covering indexesWhile they have their place...getting to the data is priority #1The thinness of an index is also importantYou also seem to be ignoring, or unaware, of the impact of DML operations may have...unless this is a pure reportng database, or warehouse (damn buzz words)I take it you actually have no concrete examples you'd like to provide/ work with....and this exercise is merely axcadmeicIf not then post some real structures....and identify the volume of dataBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-07 : 10:16:01
|
I getting a little confused about what you're actually asking here, so please forgive me if my answers are not accurate but I have the feeling that you're misunderstanding the (in)significance of the INCLUDE part of the index.The columns that form a part of the actual index and that are considered when running a query are the columns A, B, C and D in your example, *in that order*. I'll create a simple example with a 3 column table and 2 columns in the index:CREATE TABLE mytable (ColA int, ColB int, ColC int)CREATE NONCLUSTERED INDEX ix_mytable ON mytable (ColA, ColB) INCLUDE (ColC)INERT INTO mytableSELECT 1, 1, 54 UNION ALL SELECT 2, 2, 777 UNION ALL SELECT 3, 3, 12 UNION ALLSELECT 4, 1, 9845 UNION ALL SELECT 5, 2, 44 UNION ALL SELECT 6, 3, 847 This table looks like thisColA ColB ColC1 1 542 2 7773 3 124 1 98455 2 446 3 847 Now rewiew the execution plans for the following queries:select * from mytable where ColA = 4select * from mytable where ColA <> 4select * from mytable where ColB = 2select * from mytable where ColB <> 2select * from mytable where ColA = 4 AND ColC = 846select * from mytable where ColA = 4 AND ColC <> 846select * from mytable where ColB = 2 AND ColC = 846select * from mytable where ColB <> 2 AND ColC = 846select * from mytable where ColC = 846select * from mytable where ColC <> 846 Take special note of the queries where ColB = 2...this will give you an index scan because ColB is listed second in the index definition and all the data in ColB are not ordered. All queries involving ColC perform exactly the same because ColC is not a part of the index but only included. The last 2 queries would have given you a table scan if ColC was not included but filtering only by included columns will always give you an index scan.- Lumbagohttp://xkcd.com/327/ |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2009-10-07 : 10:26:03
|
quote: Originally posted by Larry Van Brocklin I have looked and found that it 'depends' and that it is 'more an art form than a science' ad nauseum - so, I get that.
Well, it does depend...quote: I have also seen where the order of key columns within the index MUST mimic the exact order of columns within the SELECT statement, or the index will not be used
Total garbage.quote: Since I cannot find anyplace that comes out and says so, through deduction I have concluded that a good starting point is to include all of the 'equality columns ' (listed first) and then the 'inequality columns' from the WHERE clause as the 'key columns' and then list the remaining columns listed in the SELECT statement as the 'included columns' - and start my testing with a covering index defined with this model.
Maybe. The ordering is the one that will work best. The risk here is very big indexes and redundant indexes. It's not always desirable to create the absolute best index for every query. That's where the 'art form' comes from.These may helphttp://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/--Gail ShawSQL Server MVP |
 |
|
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-07 : 10:34:10
|
Brett,I am glad that you find it “immensely amusing” that I have questions about something I would like to learn more about; and you are right – I do need to understand how indexes work – I thought that is what I was doing!The example table that I am testing is heavily used in reporting and could be (at some point in time) become a source table for a warehouse, although it is not now. The data trickles in (no huge bulk inserts or deletes) and may (or may not) get modified for a period, but then stays just stays there for reporting purposes. Therefore, the impact of DML is minimal – you know I thought I mentioned before that the table was receptive to the added ‘costs’ of a covering index!I could give ‘concrete’ examples but chose not to because this IS an academic exercise and I am not looking for specific answers to a specific problem, but rather asking broader questions about the subject of covering indexes. Are academic questions a bad thing?Now that we have established that you have an innate talent for stating the obvious and a propensity to demean me, I have failed to see where you answered any of my questions....or did I miss something here? |
 |
|
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-07 : 11:08:24
|
Lumbago,I understand what you sent and I appreciate your efforts to help me.I must not have been clear on one point - equality and inequality columns as defined in the create index statement.Maybe I am having a fundamental problem with the understanding of equality and inequality columns as Key Columns as defined in the create index statement. So far, no one on the forum is addressing it and this is a big part of my question.I am going to redo my testing using my testing index and verify what I thought I was seeing is true.Thanks,Larry |
 |
|
X002548
Not Just a Number
15586 Posts |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2009-10-07 : 11:48:25
|
quote: Originally posted by Larry Van Brocklin Maybe I am having a fundamental problem with the understanding of equality and inequality columns as Key Columns as defined in the create index statement. So far, no one on the forum is addressing it and this is a big part of my question.
Did you perhaps read the two blog posts I referred you to?--Gail ShawSQL Server MVP |
 |
|
X002548
Not Just a Number
15586 Posts |
|
X002548
Not Just a Number
15586 Posts |
|
Larry Van Brocklin
Starting Member
8 Posts |
Posted - 2009-10-07 : 12:14:06
|
Hey Gail,I did go out to your suggested sites and recognized them as sites I had already visited.In the meantime I went back to my testing index on my test database and performed some more tests and found that I had made a mistake that changes a lot.In previous tests I mistakenly determined that when using an index where I had created equality and inequality columns and then ran a query where a column originally intended to be (for instance) an inequality column, and then used as an equality in the WHERE clause that it would not use that index. This did not make sense to me and so I was on a quest to find out how SQL would know the difference - when in fact - it does not.A key column is a key column, and key columns should be ordered by selectivity.Consequently, with that bit of misunderstanding cleared up then the ability to define the order of the more selective columns became clear again to me.Test, retest, and make sure you correctly interpret the output - this is where I failed. Lesson learned.Thanks to you and everyone else who replied.Larry |
 |
|
X002548
Not Just a Number
15586 Posts |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2009-10-07 : 17:37:39
|
quote: Originally posted by Larry Van Brocklin A key column is a key column, and key columns should be ordered by selectivity.
Not necessarily.If 60% of queries on a table filter by ColA and ColB and the remaining 40% filter only by ColB, then it's not very useful to make the index ColA, ColB even if ColA is highly selective and ColB is less selective.Selectivity has to be taken into account, but it's far from the only thing. That's what I was trying to get at in those posts.If all the queries filter by ColA, ColB, ColC then sure, put the most selective one first. If all queries filter by ColC, most by ColB and ColC and the rest by ColA, ColB and ColC, then putting ColA as the leading column is senseless, doing so will ensure that most queries will have to scan rather than seek.--Gail ShawSQL Server MVP |
 |
|
|