Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
waveform
Yak Posting Veteran
93 Posts |
Posted - 2008-03-27 : 12:10:51
|
Hi everyone.Simple question here.. just wondering what the established best-performance approach is to setting up indexes to reflect one-to-many relationships between tables. To take the classic Authors and Books example - many books to 1 author - I want to search my Books table by Author, then by Title. So my query would be:SELECT a.LastName, a.Initial, b.Title FROM Books b INNER JOIN Authors a ON b.AuthorID = a.AuthorID ORDER BY a.LastName, a.Initial, b.TitleAs you can see I want to sort by author names, then by book titles. The way I would currently index the Books table is as follows:Primary Key: BookID (Incremented Identity)Secondary Key: AuthorID, TitleSo is that best-practice for efficient JOIN queries? Or should I have 2 *separate* indexes, one just for AuthorID (the foreign key), and the other just for book Title? I've often wondered this because, for each possible JOIN query, my indexes include AuthorID as the first field. For example, a query by Author then by books' PublishDate, would require an index containing AuthorID and PublishDate. I've often wondered if I can get away with a single index for AuthorID and the other indexes only including the other fields I want to sort/find by.Hope I've explained it well enough.. and hope I have the theory right. Thanks for any advice! |
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2008-03-27 : 19:11:34
|
Not sure what you mean by secondary key here. It looks like you are describing a composite key, which has its uses but not here.A rule of thumb is to always index your PK and any FKs. Then consider an index on anything you regularly use in WHERE, GROUP, ORDER BY. Your may have to play around to get a good balance.Pretty much though use 1 index per job although an index on a,b,c can help where a or a and b are used, i.e. the 'leading edge' is useful. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2008-03-27 : 23:49:31
|
Thanks Lozt, so you're saying the Books table should index the foreign key separately, hence:Index 1: Clustered primary on the BookID ident field,Index 2: Non-clustered on the AuthorID field (foreign key).Index 3: Non-clustered on the book Title field.I always just assumed that including the foreign key in my indexes (eg. compound AuthorID,Title) helped with the INNER JOIN operation. But you're saying it's more efficient to put the foreign key in its own separate index, yes? |
|
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2008-03-30 : 20:38:56
|
An index can only be used if it's 'leading edge' is useful to the predicate/join. A column from the middle of the index cannot be used unless the columns before it are somehow included. So yes, in general you will need a separate index for every operation such as filter, join etc. Use the explain query plan to see what it is using. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2008-04-06 : 06:18:46
|
> "An index can only be used if it's 'leading edge' is useful to the predicate/join."Thanks Lozt, I know what you're saying.. but in this example:"SELECT b.BookTitle FROM Books b INNER JOIN Authors a ON b.AuthorId = a.AuthorId WHERE a.LastName = 'Chomsky' ORDER BY b.BookTitle"We're using the fields "AuthorId" AND "BookTitle" in table Books - both to perform the join and the sort. So if my Books index had a composite key of "AuthorId, BookTitle" doesn't that exactly match the query I'm trying to perform? This is what I mean by including the foreign key in the index.. does that make sense? |
|
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2008-04-08 : 23:42:14
|
quote: Originally posted by waveformSo if my Books index had a composite key of "AuthorId, BookTitle" doesn't that exactly match the query I'm trying to perform?
Not really unless there was a unique constraint on Author, and even then I don't know if the optimiser would be smart enough.Consider where there is a Noam (ID of 1) & Bill Chomsky (ID of 2) in your author ID. Your resulting set from the join may be AuthorID, BookName1, Aardvark keeping1, Bee keeping1, Zebra keeping2, Quail Hunting2, Pig Hunting2, Rabbit HintingYou can't sort this by name using the index you have unless you (or rather the optimiser) knows only one author can be returned.WHERE B.AUTHORID=<constant> might use the indexBest play with the query analyser but my gut feel would be to go 2 indexes. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2008-04-09 : 05:47:41
|
Ok I see.. my the query I gave does indeed return books from just one author, but I think you mean the "WHERE a.LastName = 'Chomsky'" part might confuse the optimiser because we're filtering *both* tables at the same time in the query.I'm fairly new to this so haven't played with query optimiser tools - what's the simplest tool to use, which gives me the details on how much time a query takes to run in SQL Server? Is run-time the only factor I need to worry about, or also things like memory/disk usage? |
|
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2008-04-09 : 22:08:23
|
It's not so much confusing the optimiser - more that you are asking the impossible. Step through the example I gave and you will understand why it cannot use the index. You are showing a case where it could work because you know there is only one author called "chomsky". I am showing that there are cases where it will not work, therefore the general case is that it will not use the index. The opmisiser has to assume that it cannot use the index for the sort because there are cases when it simply doesn't work. You may know stuff about your data but the optimiser doesn't. Like I said, if you put a unique constraint on LastName and keep the unique author ID it *might* get it (but I bet your data would break that in a second).As to writing queries, I can only offer a few basic tips (that don't really answer your question!)- Model your data correctly- Understand the model and the results you need from a query.- Write your query to get the results you need. Don't worry about how SQL Server works it out, at least at first- Always have a PK- Always put unique & not null constraints where they exist in your data model- Always index FKs- Indexes on things in WHERE clauses can often help.- If you use DISTINCT there is a 75% chance you are doing something wrong and just hiding something that isn't working properly- Always join with the least restrictive join type, don't use OUTER joins 'to be sure'. See first & second points- When looking at an execution plan, remember that full table scans are not always a bad thing. Indexes are not always a good thing- Make sure your conditions & joins do not contain functions, or where they do, try to move the function to the 'constant' side.- If you want a sort, use ORDER BY. Don't just assume it will come out in the right order because it looks like it, or you do a GROUP BY/DISTINCT or whatever.Hope that helps a bit. |
|
|
|
|
|
|
|