Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-16 : 15:07:33
|
Hi,We have a 3m rows large table with a nvarchar(400) column.That column is indexed, and there are some searches on that column with the format:LIKE 'OrderCode=SAMESTRING_23_12_2005:abc:def%'Now the problem is that about 80% of the rows start with, the same string: "OrderCode=SAMESTRING_", this essentially makes the sargable index seek return 2m + rows.Sql Server intermittedly does index seeks & scans on the table.The point is that it is too slow, and I was wondering if just removing the duplicate characters "OrderCode=SAMESTRING_" from the data would help.Would it make the index actually work?rockmoose |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2006-05-16 : 15:20:45
|
Probably, it certainly won't make the index any less selective. |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2006-05-16 : 15:30:39
|
If you removed the string from the column, how selective would the data be? I can't remember exactly what I had read, but it said something to the effect that the first column in the index must have 90% or higher selectivity in order for it to be selected. I'm not sure if the 90% is accurate. But the point is that the index must be highly selective, otherwise SQL Server will probably not use it.Tara Kizeraka tduggan |
 |
|
cmdr_skywalker
Posting Yak Master
159 Posts |
Posted - 2006-05-16 : 15:31:56
|
yes, you can 1. remove the redundant string, save space and help performance if that is an option. Consider why it was saved in that particular format in the first place? An archaic but existing application perhaps?2. use other index in the WHERE clause that are selective (i.e. order dates?)3. user the charindex instead of like. consider the limitation though.4. Modify the like to use [] instead of %, if possible.Note, the more index, the better the select but the slower the DML (INSERT, etc.). You have to consider what is best based on the business requirements and technical resources that you have (space, cpu, maintenance, etc.)May the Almighty God bless us all! |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-16 : 15:35:08
|
Yes, that was what I thought. Thanks.This whole exercise has made me think about how sql server goes about sarging a wide character index with low selectivity for the first n characters.For instance how many characters are used for seeking into the index.I did this: LIKE 'OrderCode=SAMESTRING_23_12_2005%'according to query plan, there was an index seek which returned 2.4m rows!The # of rows returned after the filter was 7!The selectivity is very high, the information is almost unique in the column, but low selectivity in the beginning of the data.When I first created the index, I thought that there would be no problem for sql server to use it.And when searching for data not beginning with the killer string, then it is lightning fast.rockmoose |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-16 : 15:49:30
|
Thanks skywalker.1. At least this part "OrderCode=", is just something that the application smacks on the data, and removes when presenting it. So it's just waste of space.2. I considered that, The problem is that there would be more changes involved, change the sproc, change the application logic so that it provides a decent range of dates, etc..3. I will definitely try that out, thanks for the suggestion4. I don't think it is applicable hererockmoose |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-05-16 : 23:04:19
|
Much as I hate VIEWs with Indexes you could put one on this:CREATE VIEW MyViewASSELECT *, [IndexCol] = SUBSTRING([RealCol], 22, 400-22)FROM MyTablethen create an index on IndexCol. You'd need to query the View instead of the Table though, and given all of that you may be better off just changing the way the application holds data for that column!Could you get the same effect with a calculated column in the table - can that be indexed?Mind you, I hate those darn things too! Mucks up trying to insert into the table via a view, if you have one, and needing SET ARITHABORT ON or one of those blighters ...Yeah, on reflection modify the application!Kristen |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-05-17 : 00:02:59
|
If you are always searching for a certain portion of this column, you might consider adding another column (or computed column) to the table with a CHECKSUM of just that part of the data and putting an index on that.The index search should be highly selective, because it could find the small number of rows matching the CHECKSUM, and then apply the LIKE.Since a CHECKSUM is an integer, the overhead for that type of index is much lower than for the full 400 character column. If all of your searches use, for example, the first 40 characters, and that is highly selective, you migh be able to get rid of the current index, and just have a CHECKSUM index on the first 40 characters. Even several CHECKSUM indexes convering different parts of the column might use less space than the current index and give you better performance.where COL_CHECKSUM = CHECKSUM('23_12_2005') and COL like 'OrderCode=SAMESTRING_23_12_2005:abc:def%' CODO ERGO SUM |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-05-17 : 01:00:59
|
That's a nice idea MVJ. |
 |
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2006-05-17 : 01:03:56
|
that is a really good idea MVJ. I have done similar work, but used a hash value instead. Now that I think about it, a checksum would be better since it is faster to compute and the fact that there might be a small number of dupes is not really of any significance.-ec |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-05-17 : 10:33:19
|
Speaking of overlooking the obvious , I see that the query you posted has wildcard underscore characters in it. You may get your query to be more selective by using escape characters to disable the underscore wildcard.where like 'OrderCode=SAMESTRING_23_12_2005:abc:def%'Change to:where like 'OrderCode=SAMESTRING^_23^_12^_2005:abc:def%' escape '^' CODO ERGO SUM |
 |
|
mmarovic
Aged Yak Warrior
518 Posts |
Posted - 2006-05-17 : 11:10:29
|
quote: Originally posted by rockmoose Hi,We have a 3m rows large table with a nvarchar(400) column.That column is indexed, and there are some searches on that column with the format:LIKE 'OrderCode=SAMESTRING_23_12_2005:abc:def%'Now the problem is that about 80% of the rows start with, the same string: "OrderCode=SAMESTRING_", this essentially makes the sargable index seek return 2m + rows.Sql Server intermittedly does index seeks & scans on the table.The point is that it is too slow, and I was wondering if just removing the duplicate characters "OrderCode=SAMESTRING_" from the data would help.Would it make the index actually work?rockmoose
I don't know what kind of information "OrderCode=SAMESTRING_" offers, however I would either remove it from the column if it doesn't bring any info, or I would create another column where that info would be stored. Even better if this is something that could be placed in lookup table I would move that long description in lookup table and make the smallest int type column that refers the lookup table.The benefit wouldn't be higher column selectivity, but narrower index values which helps performance allowing more index rows to fit the buffer.Selectivity of column prefix doesn't matter, statistics are maintained on column level. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-05-17 : 11:16:52
|
Good point that .... create two new columns MyColFirstHalf and MyColSecondHalf, and then a calculated column, with the original name, that concatenates them - then maybe an INSTEAD OF trigger that splits them out as they are saved (only if you can't alter the application/Sproc to do the job) - so all applications that reference the original column name in SELECTs will continue to workKristen |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-19 : 14:14:48
|
Thanks all for the great input.Indexed computed columns and indexed views I ruled out pretty quickly because of all restricitions and muck with them.I tried charindex iof like, which resulted i table scans iof index seek, and the performance just as bad. The table also does not carry a clustered index, but does have an identity. I have deferred rebuilding it but will do so, and in the process remove the overhead of "OrderCode=".If that does not help perf, I will probably add the checksum index that MVJ suggested.Mind you, I have done that before in a couple of cases (spanning several columns, not part of a column), so I did not think of it here .Just thought I'd post an update and say thanx!rockmoose |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-05-19 : 14:36:13
|
If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.checksum(substring(MyCol,1,40)) = MyChecksumCol CODO ERGO SUM |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-19 : 15:27:00
|
quote: Originally posted by Michael Valentine Jones If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.checksum(substring(MyCol,1,40)) = MyChecksumCol
Hmm, but that could cause problems for dml operations, unless there are instead-of triggers.rockmoose |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-05-19 : 16:28:50
|
quote: Originally posted by rockmoose
quote: Originally posted by Michael Valentine Jones If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.checksum(substring(MyCol,1,40)) = MyChecksumCol
Hmm, but that could cause problems for dml operations, unless there are instead-of triggers.rockmoose
If you add a new column, you have to maintain it some way. I'm just suggeting adding a constraint to make sure the app maintains both columns in sync. Otherwise, what's the point?A trigger might be a good alternative. A computed column is probably the least effort as far a changing DML, and it really does fit the definition of what a computed column should be.CODO ERGO SUM |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-19 : 16:49:23
|
For the computed column I am concerned about the:quote: BOL: The connection on which the index is created, and all connections attempting INSERT, UPDATE, or DELETE statements that will change values in the index, must have six SET options set to ON and one option set to OFF. The optimizer ignores an index on a computed column for any SELECT statement executed by a connection that does not have these same option settings
I can set these in the sproc that does the search (and insert).But what happens if a connection does INSERT, UPDATE, or DELETE without those options?Nothing?, just the index not being used, or is there an error?rockmoose |
 |
|
cmdr_skywalker
Posting Yak Master
159 Posts |
Posted - 2006-05-19 : 17:57:31
|
The CHARINDEX converts the other expression to Unicode if the other is not. So, make sure both are in the unicode type. In general, can you paste the whole query and execution plan? The problem may lie somewhere else. Do you have a baseline? If not, run this baseline generator on the target database and another run on a database less used (say master). Let us know the result: (if you have a high end machine, set the limit to a higher number)PRINT '--------------------------------------------------------------------------'PRINT '-= Performance Baseline Generator v1.0 =-'PRINT '--------------------------------------------------------------------------'PRINT 'Author: Perez, Jeffrey 'PRINT 'Date: ' + CONVERT(VARCHAR, GETDATE(), 100)PRINT 'Server: '+ CAST(SERVERPROPERTY('ServerName') AS VARCHAR)PRINT 'User: ' + SUSER_SNAME() + ', ' + CURRENT_USERPRINT 'Workstation: ' + HOST_NAME()PRINT 'Version Info:' PRINT ' ' + REPLACE(@@VERSION,CHAR(9),' ')PRINT '--------------------------------------------------------------------------'GOPRINT 'Running Test...'GOIF (SELECT OBJECT_ID('test')) IS NOT NULL DROP TABLE testGOCREATE TABLE test( SAID INT IDENTITY(1,1) NOT NULL PRIMARY KEY, DATA CHAR(8000))GODECLARE @Counter INT, @Start DATETIME, @End DATETIME, @Limit INTSET NOCOUNT ONSET @Start = GETDATE()SET @Counter = 1SET @Limit = 10000WHILE @Counter < @LimitBEGIN INSERT INTO TEST(DATA) SELECT SPACE(8000) SET @Counter = @Counter + 1ENDSET @End = GETDATE()PRINT 'Test Complete.'PRINT ''PRINT 'Database: ' + DB_NAME()PRINT 'Start Time:' + CONVERT(VARCHAR, @Start,114)PRINT 'End Time:' + CONVERT(VARCHAR, @End,114)PRINT 'Baseline Mark: ' + LTRIM(STR(@Limit/DATEDIFF(ss, @Start, @End))) + ' transaction/s'GOIF (SELECT OBJECT_ID('test')) IS NOT NULL DROP TABLE testGOPRINT '--------------------------------------------------------------------------'May the Almighty God bless us all! |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-05-19 : 18:02:48
|
quote: Originally posted by rockmoose For the computed column I am concerned about the:quote: BOL: The connection on which the index is created, and all connections attempting INSERT, UPDATE, or DELETE statements that will change values in the index, must have six SET options set to ON and one option set to OFF. The optimizer ignores an index on a computed column for any SELECT statement executed by a connection that does not have these same option settings
I can set these in the sproc that does the search (and insert).But what happens if a connection does INSERT, UPDATE, or DELETE without those options?Nothing?, just the index not being used, or is there an error?rockmoose
The insert or update will fail if the settings are not correct.This code shows what happens.Create a table to test with:drop table [dbo].[MY_CHECKSUM_TEST]GOCREATE TABLE [dbo].[MY_CHECKSUM_TEST] ([ID] [int] IDENTITY (1, 1) NOT NULL ,[MY_TEXT] [varchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,[MY_CHECKSUM] AS (checksum([MY_TEXT])) )GOALTER TABLE [dbo].[MY_CHECKSUM_TEST] ADD CONSTRAINT [PK_MY_CHECKSUM_TEST] PRIMARY KEY CLUSTERED ([ID])GOset ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS on GOset NUMERIC_ROUNDABORT off GOCREATE INDEX [IX_MY_CHECKSUM_TEST_MY_CHECKSUM]ON [dbo].[MY_CHECKSUM_TEST]([MY_CHECKSUM])GOset arithabort OFF GOset numeric_roundabort OFF GOset quoted_identifier OFF GO Test inserts and updates:set QUOTED_IDENTIFIER offset arithabort offgoprint 'Bad Insert Settings'goinsert into [dbo].[MY_CHECKSUM_TEST] ( [MY_TEXT] )select [MY_TEXT] = 'aaa '+convert(varchar(400),newid())goset QUOTED_IDENTIFIER onset arithabort ongoprint 'Good Insert Settings'goinsert into [dbo].[MY_CHECKSUM_TEST] ( [MY_TEXT] )select [MY_TEXT] = 'bbb '+convert(varchar(400),newid())goset QUOTED_IDENTIFIER offset arithabort offgoprint 'Bad Update Settings'goupdate [dbo].[MY_CHECKSUM_TEST]set [MY_TEXT] = 'xxxx '+ [MY_TEXT]goset QUOTED_IDENTIFIER onset arithabort ongoprint 'Good Update Settings'goupdate [dbo].[MY_CHECKSUM_TEST]set [MY_TEXT] = 'zzzz '+ [MY_TEXT]goselect * from [dbo].[MY_CHECKSUM_TEST]gotruncate table [dbo].[MY_CHECKSUM_TEST] Results:Bad Insert SettingsServer: Msg 1934, Level 16, State 1, Line 1INSERT failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER, ARITHABORT'.Good Insert Settings(1 row(s) affected)Bad Update SettingsServer: Msg 1934, Level 16, State 1, Line 1UPDATE failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER, ARITHABORT'.Good Update Settings(1 row(s) affected)ID MY_TEXT MY_CHECKSUM ----------- ------------------------------------------------------------ ----------- 1 zzzz bbb B661FF6B-BA03-4042-B717-98F6E4496D70 -1287381225(1 row(s) affected) CODO ERGO SUM |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-05-19 : 19:00:19
|
Thank you for testing MVJ! Now that pretty much rules out any indexed computed columns, because the last thing I want is funny errors creeping up somewhere.So I will need to implement iof triggers, or just take my allergic to null medecine and use regular triggers.This is great, learning something every day.rockmoose |
 |
|
Next Page
|
|
|
|
|