| Author |
Topic |
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-09-09 : 11:38:35
|
| I have a table with the following columns:[ID] int Identity PK,[uID] int FK,[sID] int FK,[gID] int FK,[info1] nvarchar(max),[info2] nvarchar(max),[info3] nvarchar(150),[info4] nvarchar(50),[date1] datetime,[date2] datetimeThis table currently has about 16,000 rows, and in time will surpass ten million rows (at least). I need to perform lots of SELECT queries using the foreign key (integer) columns. I have created an index that contains all three columns, but wondered is this is the best approach? Basically I want to retrieve rows in the fastest possible time.Should I have an index that contains all three columns, or three indexes each containing one column? I'm new to indexing but want to adopt best-practice from the start...! |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-09 : 11:48:46
|
| It depends.Can you post some examples of the queries that will run against this table?Is the primary key the clustered index?This series (3 parts) may be worth a read: http://www.sqlservercentral.com/articles/Indexing/68439/--Gail ShawSQL Server MVP |
 |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-09-10 : 03:31:59
|
Hi GailYes, the ID column is the clustered index. I believe all the queries executed on this table contain the WHERE clause...WHERE (uID = @uID) AND (sID = @sID) AND (gID = @gID) Some stored procedures search for additional columns as well, e.g.WHERE (uID = @uID) AND (sID = @sID) AND (gID = @gID) AND (info1 LIKE '%'+@info1+'%' OR @info1 IS NULL) AND (date1 < @date1 OR @date1 IS NULL) |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-10 : 07:35:37
|
| Then at minimum you need an index with the 3 foreign key columns as the first three key columns. If there's another column that's frequently included in the where clause then put that as the 4th column/--Gail ShawSQL Server MVP |
 |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-09-10 : 08:51:39
|
| Okay thanks Gail.Please tell me, does it make a difference whether there's one index containing all three columns, or having three indexes each containing one column? |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-10 : 10:46:14
|
| A lot. Typically (not always but usually) SQL uses one index per table in a query. So an index with 3 columns it can use for a 3-column seek and get straight to the required records. Three single column indexes means SQL typically uses one of them, fetches rows that match one of the 3 conditions, then goes and looks up the rest of the columns and does a secondary filter to get rid of rows that don't match the other two conditions.Did you read those articles I referenced? If not, please do, and these two as wellhttp://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/--Gail ShawSQL Server MVP |
 |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-09-10 : 11:56:18
|
| Okay great. Leave those links with me... |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2010-09-10 : 14:19:12
|
Yes, the ID column is the clustered index. I believe all the queries executed on this table contain the WHERE clause...WHERE (uID = @uID) AND (sID = @sID) AND (gID = @gID)Some stored procedures search for additional columns as well, e.g.WHERE (uID = @uID) AND (sID = @sID) AND (gID = @gID) AND (info1 LIKE '%'+@info1+'%' OR @info1 IS NULL) AND (date1 < @date1 OR @date1 IS NULL)Then at minimum you need an index with the 3 foreign key columns as the first three key columns. If there's another column that's frequently included in the where clause then put that as the 4th columnFor queries with condition as the red part, the index will be useless, doen't matter columns in the index. Index is not that simple, I think . |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-10 : 16:04:14
|
quote: Originally posted by X002548 So..are you saying it dosn't use INDEX INTERSECTION anymore?
It can. Doesn't often but can. That's why I said 'typically' and 'usually'.--Gail ShawSQL Server MVP |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-10 : 16:05:16
|
quote: Originally posted by namman For queries with condition as the red part, the index will be useless, doen't matter columns in the index. Index is not that simple, I think .
Um, no. The index won't be optimal, but it certainly won't be useless. Why do you feel that any index will be useless for that where clause?--Gail ShawSQL Server MVP |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2010-09-10 : 20:43:23
|
quote: Originally posted by GilaMonster
quote: Originally posted by X002548 So..are you saying it dosn't use INDEX INTERSECTION anymore?
It can. Doesn't often but can. That's why I said 'typically' and 'usually'.--Gail ShawSQL Server MVP
Ummm...really...not for nothing..but if that's true..I am a heart broken crushed 15 YO girl in a dress looking for somebody asking me to danceand if it is true....welcome to DB2So Merry-go-round is A real MYTHBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-09-24 : 08:53:08
|
Thank you very much for posting those links. Although it might take a while for it all to sink through my thick skull, the articles were very insightful. I have a few questions about them if I may...?quote: Typically (not always but usually) SQL uses one index per table in a query.
What's the reason for this please? This obviously has a huge impact on the effectiveness of my indexes!quote: A lookup occurs when a nonclustered index was used to locate rows for a query, but the nonclustered index did not contain all of the columns required for the query.
Would an example of this be if an INNER JOIN was used, but the joined table column was not part of the index?In another post on SQLTeam, I read that I shouldn't use query code that contains... WHERE someCol = @val OR @val IS NULL ...even if someCol is contained within an index because SQL reverts to performing a scan. Is this correct? If so (and I apologise for my ignorance here), why hasn't SQL been developed to recognise if @val is null, and then execute the query more efficiently using a seek?quote: ...there’s a single nonclustered index on all three columns, in the order ID, SomeDate, SomeString. If there’s an inequality predicate, then then the index is only fully seekable for the following queries… WHERE ID = @ID AND SomeDate = @dt AND SomeString > @str… WHERE ID = @ID AND SomeDate > @dt… WHERE ID > @ID
If the first example above was written asWHERE ID = @ID AND SomeString > @str AND SomeDate = @dt (with second and third predicates swapped around) would SQL be smart enough to execute the query using a seek only, or is column order essential in WHERE predicates? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-09-24 : 09:42:18
|
"In another post on SQLTeam, I read that I shouldn't use query code that contains... WHERE someCol = @val OR @val IS NULL...even if someCol is contained within an index because SQL reverts to performing a scan. Is this correct? If so (and I apologise for my ignorance here), why hasn't SQL been developed to recognise if @val is null, and then execute the query more efficiently using a seek?"I don't see why that is bad (needs parenthesis around it though, assuming it is followed by an AND (Col2 = @Param2 OR @Param2 IS NULL). I have never bothered too much about the consequences, although for huge queries we do work around it.What I mean is you haven't really got a lot of choice if you have "optional" parameters from the user, so not much help someone telling you "Don't do it like that" You could use separate queries / sprocs for each combination of parameters provided - but that is not usually realistic, although it IS a good approach if you can streamline the query by cherry-picking from the parameters you are provided with. e.g. if a Parameter is provided for a column that is unique, or has very high selectivity, much better to shoot for that first rather than some other parameter that uses MyColumn LIKE '%' + @Param2 + '%'.We do things like:IF @Param1 IS NOT NULLINSERT INTO #TEMP (T_MyPK) SELECT MyPK FROM MyTable WHERE Col1 = @Param1ELSEIF @Param2 IS NOT NULLINSERT INTO #TEMP (T_MyPK) SELECT MyPK FROM MyTable WHERE Col2 LIKE '%' + @Param2 + '%'...SELECT Col1, Col2, Col3, ...FROM #TEMP JOIN MyTable ON T_MyPK = MyPKWHERE (@Param1 IS NULL OR Col1 = @Param1) AND (@Param2 IS NULL OR Col2 LIKE '%' + @Param2 + '%') Note that the @Parm1 test in the final query is redundant, so only included here by way of demonstration.The only other approach is dynamic SQL - which is probably a better bet because Query plans will be created for each variation of Parameters that are actually provided in real life, and popular ones will remain in the cache.With the advent of Certificates (available in SQL2008, maybe SQl2005 too I've forgotten) users can still just have Execute permission on the Sproc, even though the Sproc is doing dynamic SQL.DEFINE @strSQL nvarchar(4000)SELECT @strSQL = 'SELECT Col1, Col2, Col3, ...FROM MyTableWHERE 1=1'IF @Param1 IS NOT NULL SELECT @strSQL = @strSQL + ' AND Col1 = @Param1'IF @Param2 IS NOT NULL SELECT @strSQL = @strSQL + ' AND Col2 LIKE ''%'' + @Param2 + ''%'''EXEC sp_ExecuteSQL @strSQL, N'@Param1 int, @Param2 varchar(100)', @Param1 = @Param1, @Param2 = @Param2 NOTE: @Param1 & @Param2 are provided to sp_ExecuteSQL regardless of whether they are included in the query, or not, sp_ExecuteSQL doesn't care (although you can leave them out if you can easily accommodate that in your logic / can be bothered!sp_ExecuteSQL will create separate Query plans where @Param1, @Param2 and @Param1 + @Param2 are used - i.e. potentially 3 different query plans.Note that @strSQL only includes named @ Parameters - it does NOT concatenate the contents of the actual @Parameter - which would introduce the risk of SQL Injection. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-09-24 : 09:43:43
|
| "(with second and third predicates swapped around) would SQL be smart enough to execute the query using a seek only, or is column order essential in WHERE predicates?"Yes the Query optimiser is smart enough. You don't need to worry about the order of your tests, or the order of criteria in a JOIN, etc. |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-09-24 : 12:10:22
|
quote: Originally posted by Rquote: Typically (not always but usually) SQL uses one index per table in a query.
What's the reason for this please? This obviously has a huge impact on the effectiveness of my indexes!
Another link: (inspired by this thread)http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/It's most efficient to use one index, far less so to use multiple indexes and try to later combine them. I didn't do a whole lot of tests for that blog post (short on time), but I did notice that it didn't take much for SQL to stop intersecting three indexes. A little wider row and it dropped to two indexes and a key lookup. Doubt it would have taken much more for it to go for a single index and key lookup or just a table scanquote: In another post on SQLTeam, I read that I shouldn't use query code that contains... WHERE someCol = @val OR @val IS NULL ...even if someCol is contained within an index because SQL reverts to performing a scan. Is this correct?
Pretty much (and that would likely have been me saying so). http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/quote: If so (and I apologise for my ignorance here), why hasn't SQL been developed to recognise if @val is null, and then execute the query more efficiently using a seek?
Just because the parameter is null on this execution doesn't mean it'll be null on the next one.SQL caches its plans for reuse. Optimisation is expensive and it's something that you want to do as little as possible. SQL has one rule fundamental for plans - they MUST be safe for reuse. That is, if the plan is reused with a different set of parameters it must return the correct results.You can see, in the case of the @var IS NULL or col = @var, SQL cannot do a seek on col = @var if @var is not null on the first execution because, if it does, the results will be wrong on a subsequent execution where @var IS NULL.quote: If the first example above was written asWHERE ID = @ID AND SomeString > @str AND SomeDate = @dt (with second and third predicates swapped around) would SQL be smart enough to execute the query using a seek only, or is column order essential in WHERE predicates?
http://sqlinthewild.co.za/index.php/2007/07/29/order-of-execution/ (old post and not up to my current standards, but it'll do)--Gail ShawSQL Server MVP |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-09-24 : 14:34:14
|
| Yeah, but there might be a test on another, indexed, column that narrows the effort. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-09-25 : 03:28:35
|
| Interesting. OK I'll try knocking up a test example. |
 |
|
|
|