when i create cluster index ? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

when i create cluster index ?

Author

Topic

chbala85
Starting Member

49 Posts

Posted - 2013-05-21 : 08:46:44

I'm in the middle of developing a project on my own (first time for me for this scale) and it started in a rush so my tables are kind of miserable. I need to add proper index' and need some help.

I'm confused, need to clarify a few points:

Should I use index for non-int columns? Why/why not
I've read a lot about clustered and nonclustered index yet I still can't decide when to use one. A good example would help me and a lot of other developers.
I know that I shouldn't use indexes for columns or tables that are often updated. That is in pocket for now. What else should I be careful about and how can I know that it is all good before going to test phase?

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-05-21 : 10:13:19

There is a lot there in the questions you are asking; take a look at this article and some of the links from there (Cluster Index Design Guidelines etc.) http://msdn.microsoft.com/en-us/library/ms191195(v=sql.105).aspx

In general, the nature of your data and the queries you want to run against them determine the indexes you need and deterimine which one you pick as the clustered index.

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2013-05-21 : 12:15:21

I think there is a deal of confusion here for you. In particular I'm not sure if you are sure of the distinction between *keys* which identify rows and *indexes* which help you find things. I'm bored so I'll try and help in detail!

Lets go over the points:

quote:

Should I use index for non-int columns? Why/why not

Of course. An index is used to find things. Think about a phone book. It has an index of names -- in SQL Server land something like:


CREATE INDEX names ON tbPhoneBook ([Surname], [Firstname], [Address])

This means that any searches that look at the Surname will be able to quickly find relevant rows (this index doesn't help you look for things unless you know the surname). Imagine trying to find all the phone numbers of people with first name 'John' in the phone book -- you have to scan every page.

However if you want to know all the phone numbers of people with surname 'Blogs' -- then you go right to the relevent page and read them off. If you want to find people with surname 'Blogs' and Firstname 'John' then it's even faster....

quote:

I've read a lot about clustered and nonclustered index yet I still can't decide when to use one. A good example would help me and a lot of other developers.

Lets go back to the phone book analogy. The phone book is printed in order (surname, firstname, address). Physically the data in the phonebook is arranged in that way. This is the *clustered index*. It's printed in that way because this is the most frequent use case for the book -- people want to find the phone numbers of people they know the names of.

A table in SQL Server is stored as a balanced tree -- the Clustered index is that structure. That's why every table can only have 1 clustered index.

Choosing the Clustered index is one of the most important factors in database design but choosing the right clustered index depends on what your data looks like, how many inserts you get, what key the data has.

Lets imagine a table of customers and think about keys: We store columns [FirstName], [Surname], [CustomerID], [Address], [DateOfBirth]

The [CustomerID] column here is for us to uniquely identify out customer -- there could be more than one person with identical details (families where the names are handed down for example).

For us, it's convenient to make up a key ([CustomerID]) which is simply a number because we have no way of deterministically referring to customers otherwise. So we'll model that table like so:


CREATE TABLE Customers (
	  [CustomerID] INT IDENTITY(1,1)
	, [CustomerFirstName] NVARCHAR(255)
	, [CustomerSurname] NVARCHAR(255)
	, [CustomerAddress] NVARCHAR(512)
	, [CustomerDataOfBirth] DATE
	
	, CONSTRAINT PK_CustomerID PRIMARY KEY CLUSTERED ([CustomerID])
	)

Have a look at the Primary key declaration -- we say that this table's primary key is on the CustomerID column. This means that as we add more customers they will be added to the CLUSTERED INDEX in order (as a new customer gets a greater ID than any existing customer). This is important as it means that INSERTS do not cause us to shuffle the table order.

Imagine if we had to add more people to the phone book -- the clustered index of the phonebook is surname, firstname, address. Adding someone to the middle means we have to shift all the other names around the pages. And Names of new people are random.

This kind of primary key is called a SURROGATE KEY (it doesn't *mean* anything in an of itself).

This is obviously useless for us to find anyone unless we know their CustomerID. We therefore make INDEXES on other columns to help us find customers.

Example -- a NONCLUSTERED Index on Firstname:


CREATE INDEX IX_Customers_FirstName ON Customers ([CustomerFirstName])

This index lets us quickly find people with firstname 'John'. By looking up the index we get back a list of JOHN entries.


SELECT * FROM Customers WHERE [CustomerFirstName] = 'John'

Technically this query would get the keys for every John Row and then use the keys to fetch the rest of the information (you'd see a Clustered Index lookup) in the query plan...

quote:

I know that I shouldn't use indexes for columns or tables that are often updated. That is in pocket for now. What else should I be careful about and how can I know that it is all good before going to test phase?

It's a tradeoff -- if you need the index to support rapid retrieval then you will need to live with the impact of your Inserts and Updates.

Updates will actually be *helped* not hindered by indexes that will help the engine find the relevant rows more quickly.

Hope this helps out somewhat. I'd like to know how you are modelling your data -- maybe if you gave a little more information we'd be able to help in detail?

Cheers,
Charlie.

Transact Charlie

Msg 3903.. The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

http://nosqlsolution.blogspot.co.uk/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources