How to Index Something? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

How to Index Something?

Author

Topic

larryg003
Starting Member

18 Posts

Posted - 2010-06-14 : 19:19:01

Hi guys,
I would really appreciate it if you could help me go about a SQL problem I am having.

I have 15 columns and 42 million rows of data that I am trying
to run queries on. Unfortunately the queries, although simple are
taking a lot longer than I expected.

How would I go about indexing the database to make the queries faster?
Would I just do:
CREATE INDEX index_name
ON table_name (column_name)

to all 15 columns of data?

Also what's the difference between an index vs. a unique index?

Lastly (most importantly) say for example out of my 15 columns I have two called [RELATED_TGB] and [UNIQUE_NUMBER]
say I wanted to run the query:

select
count distinct "UNIQUE_NUMBER"
where [RELATED_TGB] like '%ABC%'

What would I need to index and how to run this query faster?

Thank you for all of your help,

-Larry G.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-06-14 : 19:39:43

You should definitely not index each column. 15 indexes is way too many. You'll kill performance of inserts, updates, and deletes.

The most indexes I'll put on a table is 5-7.

The difference between a normal index and a unique index is that a unique index requires the data to be unique. So if your index is on Column1 and Column2 (composite index) and you may it unique, then each pair together must be unique.

Unfortunately your query is going to be slow due to the leading %. You can index RELATED_TGB, but it's not going to be super fast due to the nature of the query.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2010-06-14 : 19:42:52

You should check out Books On Line for more about indexes. A unique index is just that, it is a column or columns that make the row unique. For example, an index on state alone probably isn't unique, but maybe an index on state, agencynumber is. What you want to put in your index are the columns that appear in your where clause or your joins, plus the data that you're after, if applicable. An index on [RELATED_TGB] might be useful, but not if you put a function on it (including "LIKE"). Creating a unique index on "Unique_Number" is a good place to start. But the ultimate answer to your question is "It depends". It depends on the needs of each query, it depends on the table, and mostly it depends on looking at the execution plan (hit ctrl + l, if you have permissions on it). CREATE UNIQUE INDEX idx1 on myTable(Unique_Number,related_TGB) will probably help.

Jim

Everyday I learn something that somebody else already knew

Kristen
Test

22859 Posts

Posted - 2010-06-14 : 20:38:37

You should tune your queries - not build indexes to your data.

So if


select 
count distinct "UNIQUE_NUMBER"
where [RELATED_TGB] like '%ABC%'

is worthy of being tuned (e.g. it is a common query and people have to wait too long for results, or it only runs once a month but takes hours to run and that impacts other people using the server) then you need to consider how to improve it.

Experience counts for a lot, of course, but you can also view the Query Plan that SQL is using for the query (experience definitely important in interpreting Query Plans I'm afraid ...)

With no particular indexes what you will most probably see is a Query Plan that says "Scan through the whole table looking for "[RELATED_TGB] like '%ABC%'"

You could add an index on [RELATED_TGB]. Then the query plan would most probably show "Scan through the [RELATED_TGB] index looking for values like '%ABC%'"

However - you might find that SQL will ignore your new index and still use the original "Scan through the whole table looking for "[RELATED_TGB] like '%ABC%'"

Why?

Well ... SQL maintains statistics about the distribution of values in each index; if it estimates that looking in the index for '%ABC%', and THEN looking up each row in the actual table, will take longer than just scanning the table, then it will choose to just scan the table ...

LIKE '%XXX%' is a good case in point. An index doesn't help very much, the whole index has got to be scanned to find all the matching entries.

If you had said:

where [RELATED_TGB] like 'ABC%'

then an index becomes much more useful. SQL could do "Find the lowest value starting with "ABC", and then read the index sequentially until the Last value starting with "ABC". As they are all grouped sequentially in the index the index will help a lot as only that block of the index has to be read.

However, maybe 99% of the entries in the index start with "ABC" ... again, the index is not much help because for each entry found SQL has to then find the corresponding record in the table, and thus for a 99% hit it will be faster to just scan the table rather than then index+table. SQL uses the Statistics to make that "guess". If 99% of the entries did start with "ABC" then a query looking for

where [RELATED_TGB] like 'XYZ%'

would be in the remaining 1% and the statistics would give SQL a "guess" that it was well worth using the index.

This is described as being how "selective" it is.

Then we come to the

count distinct "UNIQUE_NUMBER"

bit.

Lets say that the index was not just the single column [RELATED_TGB], but had two key columns [RELATED_TGB], [UNIQUE_NUMBER]

Now SQL has the information it needs from JUST the index - it doesn't have to go to the Record at all. (The index so called "covers" the query). This can make a dramatic improvement in performance.

However, setting up a complex index like this for every possible query is counter productive, as Tara already described. The more indexes you have the more index-manipulation is needed every time you insert a record, or update one.

If your database is Read-Only then adding more and more indexes can help, up to a point, but even then you get to a point where it takes SQL ages (relatively speaking!) to decide WHICH index it should use to solve a particular query.

So ... no magic bullet.

One suggestion: Take a representative query, run it, create an index which you think may help, re-run the query, compare the two.

SQL will cache data and query plans, so a simple stop-watch-timing is probably not going to be representative - other people using the server, data in the disk controller cache, all these things will vary the outcome. But the Query Plan will show you the "Logical I/O", number of Table Scans, and some estimates. These can be compared from one experiment to the next to give a reasonable indication of how queries perform relative to each other.

If you have a DEV server you can also "empty" the Cache before each test (but don't do that on a server that other people are actively using!) then, given that you have many millions of rows of data, so queries will take a measurable amount of time anyway, stopwatch-timing will probably be a pretty reasonable experimental method.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-06-15 : 01:38:50

These may be worth a read
[url]http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/[/url]
[url]http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2/[/url]

--
Gail Shaw
SQL Server MVP

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources