Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 Understanding Non Custer Index

Author  Topic 

dewacorp.alliances

452 Posts

Posted - 2009-03-16 : 20:33:35
Hi there

I have about 15+ million records in one table and it's increasing rapidly.

From my understanding that whatever you have in the query "WHERE" statement is, I should put into into a cluster index. I've notice I can have 1 index with more than 1 columns included.

Now my question is, let say if I have 2 queries like these:

Query 1:
SELECT *
WHERE FieldName1=??? AND FieldName2=???

Query 2:
SELECT *
WHERE FieldName1=??? AND FieldNAme3=???

Does this mean ideally I should create 2 indexes like these ?!?!

IX_Index1:
Include FieldName1 and FieldName2

IX_Index2:
Include FieldName1 and FieldName2

Can some one tell me the best practice is?

I'm appreciated your help.

Thanks

lionofdezert
Aged Yak Warrior

885 Posts

Posted - 2009-03-17 : 07:36:28
Yes you need two non clustered indexes to meet your need for two different queries with two different columns in WHERE clause:

CREATE NONCLUSTERED INDEX [IX_Index1] ON [yourTable]
(
[FieldName1 ] ASC,
[FieldName2] ASC
)

CREATE NONCLUSTERED INDEX [IX_Index2] ON [yourTable]
(
[FieldName1 ] ASC,
[FieldName3] ASC
)

INCLUEDE is a different thing, use include when you have columns in your query which are part of your select queiry but not a part of WHERE clause.
Go to Top of Page

lionofdezert
Aged Yak Warrior

885 Posts

Posted - 2009-03-17 : 08:16:04
Non-clustered indexes are best for queries:

* That return few rows (including just one row) and where the index has good selectivity (generally above 95%).
* That retrieve small ranges of data (not large ranges). Clustered indexes perform better for large range queries.
* Where both the WHERE clause and the ORDER BY clause are both specified for the same column in a query. This way, the non-clustered index pulls double duty. It helps to speed up accessing the records, and it also speeds up the sorting of the records (because the returned data is already sorted).
* That use JOINs (although clustered indexes are better if they can be used).
* When the column or columns to be indexed are very wide. While wide indexes are never a good thing, if you have no choice, a non-clustered index will have overall less overhead than a clustered index on a wide index.
Go to Top of Page

mfemenel
Professor Frink

1421 Posts

Posted - 2009-03-17 : 09:22:22
Let me ask you a few more questions. Do you really need to be doing select *? A covering index may help your performance if you have a more precise select statement. Do you have a clustered index on the table right now? If so what is it on? It would help us to see your table structure. If you create a non clustered index and only supply your where clause columns it's going to have to go back to the table and look up all the other values it needs to return because of the select *. If you have a clustered index you would be better served to make that useful because then it only has scan the clustered index to return everything it needs and you don't need to do the included column.

Mike
"oh, that monkey is going to pay"
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2009-03-17 : 09:31:06
quote:
Originally posted by lionofdezert

Non-clustered indexes are best for queries:

* That return few rows (including just one row) and where the index has good selectivity (generally above 95%).
* That retrieve small ranges of data (not large ranges). Clustered indexes perform better for large range queries.
* Where both the WHERE clause and the ORDER BY clause are both specified for the same column in a query. This way, the non-clustered index pulls double duty. It helps to speed up accessing the records, and it also speeds up the sorting of the records (because the returned data is already sorted).
* That use JOINs (although clustered indexes are better if they can be used).
* When the column or columns to be indexed are very wide. While wide indexes are never a good thing, if you have no choice, a non-clustered index will have overall less overhead than a clustered index on a wide index.

Some of this stuff isn't exactly spot on, or at least needs qualifying - when a nonclustered index covers the query, it behaves just like a clustered index. In this case the stuff about selectivity, JOINS, small ranges etc. changes
Go to Top of Page

dewacorp.alliances

452 Posts

Posted - 2009-03-18 : 02:02:47
quote:
Originally posted by mfemenel

Let me ask you a few more questions. Do you really need to be doing select *? A covering index may help your performance if you have a more precise select statement. Do you have a clustered index on the table right now? If so what is it on? It would help us to see your table structure. If you create a non clustered index and only supply your where clause columns it's going to have to go back to the table and look up all the other values it needs to return because of the select *. If you have a clustered index you would be better served to make that useful because then it only has scan the clustered index to return everything it needs and you don't need to do the included column.



Hi Mike.

The query above is only an example and yes I do have precise select statement. I don't think I have Clustered Index specifically I only have:

Primary Key - PK_prf_BatchItems for the BatchItemID
Index - IX_prf_BacthItems_1 for the AccountNo, Code1, Code2, Code3
Index - IX_prf_BacthItems_2 for BatchID

Basically the table like this:
BatchItemID (PK) - Identity
BatchID INT
AccountNo Varchar(32),
Code1 Varchar(32),
Code2 VarCHar(33),
NetAmount MONEY
plus others ...

This table is like a FACT table in datawarehousing and the LOAD is always happening never deleted (this is based on the BatchID). So ... 1 load possibly ha 1.5 million with BatchID=1 and next one load again 1.1 million records and BatchID=2 and soon and soon.
As you can see the growth quite massively .. we are reaching 20GB+ easily on this table itself and potential 250+GB for that table.

The report as based on BatchID though so it always does
WHERE BacthID=X AND (other criteria). and some times even combined
WHERE BacthID IN (X, Y) AND (other criteria)

I am kinda concern about the performance.

Thanks

Go to Top of Page

mfemenel
Professor Frink

1421 Posts

Posted - 2009-03-18 : 08:30:29
That "other criteria" part is what I meant by needing a precise select statement. It would help if we could see your "other criteria" so we can make a more educated stab at your indexing.

Mike
"oh, that monkey is going to pay"
Go to Top of Page

dewacorp.alliances

452 Posts

Posted - 2009-03-19 : 02:20:11
quote:
Originally posted by mfemenel

That "other criteria" part is what I meant by needing a precise select statement. It would help if we could see your "other criteria" so we can make a more educated stab at your indexing.



The criteria will be:
1. WHERE BacthID=X AND Code1=XXX
2. WHERE BatchID=X AND Code1=XXX and Code2=XXX
3. WHERE BacthID IN (X, Y) AND Code1=XXX
4. WHERE BacthID IN (X, Y) AND Code1=XXX AND COde2=XXX
5. WHERE BacthID=X AND AccountNo=XXX
and possibly more.

Hope this one clear.

Does this one means that I need 5 Clustered indexes? Or 5 Non Clustered Indexes? Also the data is very wide range though, the selection for BatchID range is between 800K to 1.5 million records.


Go to Top of Page

pootle_flump

1064 Posts

Posted - 2009-03-19 : 04:20:30
How many rows, typically, will those five queries return?
Go to Top of Page

dewacorp.alliances

452 Posts

Posted - 2009-03-19 : 08:03:02
quote:
Originally posted by pootle_flump

How many rows, typically, will those five queries return?



It depends but generally for the 1st query will be like 80% data of the total of BatchID = X so we are talking about 1 million records.

For the query no 3, because each BatchID containing almost 1 million records so if it's 2x BatchID involved means close 2-3 millions.

Go to Top of Page

pootle_flump

1064 Posts

Posted - 2009-03-19 : 10:22:34
A nonclustered index is used to identify rows efficiently. SQL Server uses the index to find the rows, but then has to go to the base table to get the data. This trip is called a lookup, and is expensive. If the query returns a small proportion of the table, it is worth doing. More than around 5% of the table (often around 3% IME) it is cheaper to ignore the index and scan the table.

However, a covering index is one where all the columns in the query are part of the index (or leaf pages). In this case, SQL Server does not have to visit the table. As such, they are really efficient. However, you use the dreaded "SELECT *". Do you actually use all columns returned by this?
Go to Top of Page

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2009-03-19 : 11:39:32
In addition to all that has been meentioned, its the left most column in the index that is critical and decides if the index can be used by the optimizer. Between field1 and field2 find the one that is most distinct (has most selectivity.. or is most unique) and use that as the left most column. the more unique the index is, the better are its chances of being used.

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page

dewacorp.alliances

452 Posts

Posted - 2009-03-19 : 16:50:10
quote:
Originally posted by pootle_flump

A nonclustered index is used to identify rows efficiently. SQL Server uses the index to find the rows, but then has to go to the base table to get the data. This trip is called a lookup, and is expensive. If the query returns a small proportion of the table, it is worth doing. More than around 5% of the table (often around 3% IME) it is cheaper to ignore the index and scan the table.

However, a covering index is one where all the columns in the query are part of the index (or leaf pages). In this case, SQL Server does not have to visit the table. As such, they are really efficient. However, you use the dreaded "SELECT *". Do you actually use all columns returned by this?



Covering index? Is there such thing in the sql server 05? Regarding the select * as I said I am not using in query I specify the columns that I want to pick.
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2009-03-20 : 07:13:05
quote:
Originally posted by dewacorp.alliances

quote:
Originally posted by pootle_flump

A nonclustered index is used to identify rows efficiently. SQL Server uses the index to find the rows, but then has to go to the base table to get the data. This trip is called a lookup, and is expensive. If the query returns a small proportion of the table, it is worth doing. More than around 5% of the table (often around 3% IME) it is cheaper to ignore the index and scan the table.

However, a covering index is one where all the columns in the query are part of the index (or leaf pages). In this case, SQL Server does not have to visit the table. As such, they are really efficient. However, you use the dreaded "SELECT *". Do you actually use all columns returned by this?



Covering index? Is there such thing in the sql server 05? Regarding the select * as I said I am not using in query I specify the columns that I want to pick.

"Covering" is a description of how the query compares to the index. It is not a technical term - if an index exists, it can potnetially cover.

Index selection is a complex topic and is absolutely determined by the statements you run. All the other stuff mentioned is secondary to the queries you are running. I have often used highly unselective columns to lead indexes because that is what the queries require. In short - you must consider your EXACT queries, not approximations. You can't say "what is the optimal indexing strategy for some queries that are a bit like this?". You must be precise.
Go to Top of Page
   

- Advertisement -