Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 Index Advice

Author  Topic 

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-16 : 15:07:33
Hi,

We have a 3m rows large table with a nvarchar(400) column.
That column is indexed, and there are some searches on that column with the format:
LIKE 'OrderCode=SAMESTRING_23_12_2005:abc:def%'

Now the problem is that about 80% of the rows start with, the same string: "OrderCode=SAMESTRING_", this essentially makes the sargable index seek return 2m + rows.
Sql Server intermittedly does index seeks & scans on the table.

The point is that it is too slow, and I was wondering if just removing the duplicate characters "OrderCode=SAMESTRING_" from the data would help.
Would it make the index actually work?


rockmoose

robvolk
Most Valuable Yak

15732 Posts

Posted - 2006-05-16 : 15:20:45
Probably, it certainly won't make the index any less selective.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2006-05-16 : 15:30:39
If you removed the string from the column, how selective would the data be? I can't remember exactly what I had read, but it said something to the effect that the first column in the index must have 90% or higher selectivity in order for it to be selected. I'm not sure if the 90% is accurate. But the point is that the index must be highly selective, otherwise SQL Server will probably not use it.

Tara Kizer
aka tduggan
Go to Top of Page

cmdr_skywalker
Posting Yak Master

159 Posts

Posted - 2006-05-16 : 15:31:56
yes, you can
1. remove the redundant string, save space and help performance if that is an option. Consider why it was saved in that particular format in the first place? An archaic but existing application perhaps?
2. use other index in the WHERE clause that are selective (i.e. order dates?)
3. user the charindex instead of like. consider the limitation though.
4. Modify the like to use [] instead of %, if possible.

Note, the more index, the better the select but the slower the DML (INSERT, etc.). You have to consider what is best based on the business requirements and technical resources that you have (space, cpu, maintenance, etc.)

May the Almighty God bless us all!
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-16 : 15:35:08
Yes, that was what I thought. Thanks.

This whole exercise has made me think about how sql server goes about sarging a wide character index with low selectivity for the first n characters.
For instance how many characters are used for seeking into the index.

I did this: LIKE 'OrderCode=SAMESTRING_23_12_2005%'
according to query plan, there was an index seek which returned 2.4m rows!
The # of rows returned after the filter was 7!

The selectivity is very high, the information is almost unique in the column, but low selectivity in the beginning of the data.
When I first created the index, I thought that there would be no problem for sql server to use it.
And when searching for data not beginning with the killer string, then it is lightning fast.

rockmoose
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-16 : 15:49:30
Thanks skywalker.

1. At least this part "OrderCode=", is just something that the application smacks on the data, and removes when presenting it. So it's just waste of space.
2. I considered that, The problem is that there would be more changes involved, change the sproc, change the application logic so that it provides a decent range of dates, etc..
3. I will definitely try that out, thanks for the suggestion
4. I don't think it is applicable here

rockmoose
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-05-16 : 23:04:19
Much as I hate VIEWs with Indexes you could put one on this:

CREATE VIEW MyView
AS
SELECT *, [IndexCol] = SUBSTRING([RealCol], 22, 400-22)
FROM MyTable

then create an index on IndexCol. You'd need to query the View instead of the Table though, and given all of that you may be better off just changing the way the application holds data for that column!

Could you get the same effect with a calculated column in the table - can that be indexed?

Mind you, I hate those darn things too! Mucks up trying to insert into the table via a view, if you have one, and needing SET ARITHABORT ON or one of those blighters ...

Yeah, on reflection modify the application!

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-05-17 : 00:02:59
If you are always searching for a certain portion of this column, you might consider adding another column (or computed column) to the table with a CHECKSUM of just that part of the data and putting an index on that.

The index search should be highly selective, because it could find the small number of rows matching the CHECKSUM, and then apply the LIKE.

Since a CHECKSUM is an integer, the overhead for that type of index is much lower than for the full 400 character column. If all of your searches use, for example, the first 40 characters, and that is highly selective, you migh be able to get rid of the current index, and just have a CHECKSUM index on the first 40 characters.

Even several CHECKSUM indexes convering different parts of the column might use less space than the current index and give you better performance.


where
COL_CHECKSUM = CHECKSUM('23_12_2005') and
COL like 'OrderCode=SAMESTRING_23_12_2005:abc:def%'


CODO ERGO SUM
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-05-17 : 01:00:59
That's a nice idea MVJ.
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-05-17 : 01:03:56
that is a really good idea MVJ.

I have done similar work, but used a hash value instead. Now that I think about it, a checksum would be better since it is faster to compute and the fact that there might be a small number of dupes is not really of any significance.



-ec
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-05-17 : 10:33:19
Speaking of overlooking the obvious , I see that the query you posted has wildcard underscore characters in it. You may get your query to be more selective by using escape characters to disable the underscore wildcard.


where
like 'OrderCode=SAMESTRING_23_12_2005:abc:def%'

Change to:

where
like 'OrderCode=SAMESTRING^_23^_12^_2005:abc:def%' escape '^'


CODO ERGO SUM
Go to Top of Page

mmarovic
Aged Yak Warrior

518 Posts

Posted - 2006-05-17 : 11:10:29
quote:
Originally posted by rockmoose

Hi,

We have a 3m rows large table with a nvarchar(400) column.
That column is indexed, and there are some searches on that column with the format:
LIKE 'OrderCode=SAMESTRING_23_12_2005:abc:def%'

Now the problem is that about 80% of the rows start with, the same string: "OrderCode=SAMESTRING_", this essentially makes the sargable index seek return 2m + rows.
Sql Server intermittedly does index seeks & scans on the table.

The point is that it is too slow, and I was wondering if just removing the duplicate characters "OrderCode=SAMESTRING_" from the data would help.
Would it make the index actually work?


rockmoose


I don't know what kind of information "OrderCode=SAMESTRING_" offers, however I would either remove it from the column if it doesn't bring any info, or I would create another column where that info would be stored. Even better if this is something that could be placed in lookup table I would move that long description in lookup table and make the smallest int type column that refers the lookup table.

The benefit wouldn't be higher column selectivity, but narrower index values which helps performance allowing more index rows to fit the buffer.
Selectivity of column prefix doesn't matter, statistics are maintained on column level.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-05-17 : 11:16:52
Good point that .... create two new columns MyColFirstHalf and MyColSecondHalf, and then a calculated column, with the original name, that concatenates them - then maybe an INSTEAD OF trigger that splits them out as they are saved (only if you can't alter the application/Sproc to do the job) - so all applications that reference the original column name in SELECTs will continue to work

Kristen
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-19 : 14:14:48
Thanks all for the great input.

Indexed computed columns and indexed views I ruled out pretty quickly because of all restricitions and muck with them.
I tried charindex iof like, which resulted i table scans iof index seek, and the performance just as bad.

The table also does not carry a clustered index, but does have an identity. I have deferred rebuilding it but will do so, and in the process remove the overhead of "OrderCode=".
If that does not help perf, I will probably add the checksum index that MVJ suggested.
Mind you, I have done that before in a couple of cases (spanning several columns, not part of a column), so I did not think of it here .

Just thought I'd post an update and say thanx!

rockmoose
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-05-19 : 14:36:13
If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.

checksum(substring(MyCol,1,40)) = MyChecksumCol


CODO ERGO SUM
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-19 : 15:27:00
quote:
Originally posted by Michael Valentine Jones

If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.

checksum(substring(MyCol,1,40)) = MyChecksumCol



Hmm, but that could cause problems for dml operations, unless there are instead-of triggers.

rockmoose
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-05-19 : 16:28:50
quote:
Originally posted by rockmoose

quote:
Originally posted by Michael Valentine Jones

If you add the checksum column, it is a good idea to add a check constraint to make sure the checksum and column value don't get out of sync. That is one thing in favor of a computed column; they can't get out of sync.

checksum(substring(MyCol,1,40)) = MyChecksumCol



Hmm, but that could cause problems for dml operations, unless there are instead-of triggers.

rockmoose


If you add a new column, you have to maintain it some way. I'm just suggeting adding a constraint to make sure the app maintains both columns in sync. Otherwise, what's the point?

A trigger might be a good alternative. A computed column is probably the least effort as far a changing DML, and it really does fit the definition of what a computed column should be.



CODO ERGO SUM
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-19 : 16:49:23
For the computed column I am concerned about the:
quote:
BOL: The connection on which the index is created, and all connections attempting INSERT, UPDATE, or DELETE statements that will change values in the index, must have six SET options set to ON and one option set to OFF. The optimizer ignores an index on a computed column for any SELECT statement executed by a connection that does not have these same option settings

I can set these in the sproc that does the search (and insert).
But what happens if a connection does INSERT, UPDATE, or DELETE without those options?
Nothing?, just the index not being used, or is there an error?

rockmoose
Go to Top of Page

cmdr_skywalker
Posting Yak Master

159 Posts

Posted - 2006-05-19 : 17:57:31
The CHARINDEX converts the other expression to Unicode if the other is not. So, make sure both are in the unicode type. In general, can you paste the whole query and execution plan? The problem may lie somewhere else. Do you have a baseline? If not, run this baseline generator on the target database and another run on a database less used (say master). Let us know the result: (if you have a high end machine, set the limit to a higher number)

PRINT '--------------------------------------------------------------------------'
PRINT '-= Performance Baseline Generator v1.0 =-'
PRINT '--------------------------------------------------------------------------'
PRINT 'Author: Perez, Jeffrey '
PRINT 'Date: ' + CONVERT(VARCHAR, GETDATE(), 100)
PRINT 'Server: '+ CAST(SERVERPROPERTY('ServerName') AS VARCHAR)
PRINT 'User: ' + SUSER_SNAME() + ', ' + CURRENT_USER
PRINT 'Workstation: ' + HOST_NAME()
PRINT 'Version Info:'
PRINT ' ' + REPLACE(@@VERSION,CHAR(9),' ')
PRINT '--------------------------------------------------------------------------'
GO
PRINT 'Running Test...'
GO
IF (SELECT OBJECT_ID('test')) IS NOT NULL DROP TABLE test
GO
CREATE TABLE test(
SAID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
DATA CHAR(8000)
)
GO
DECLARE
@Counter INT,
@Start DATETIME,
@End DATETIME,
@Limit INT

SET NOCOUNT ON
SET @Start = GETDATE()
SET @Counter = 1
SET @Limit = 10000
WHILE @Counter < @Limit
BEGIN
INSERT INTO TEST(DATA)
SELECT SPACE(8000)

SET @Counter = @Counter + 1
END
SET @End = GETDATE()
PRINT 'Test Complete.'
PRINT ''
PRINT 'Database: ' + DB_NAME()
PRINT 'Start Time:' + CONVERT(VARCHAR, @Start,114)
PRINT 'End Time:' + CONVERT(VARCHAR, @End,114)
PRINT 'Baseline Mark: ' + LTRIM(STR(@Limit/DATEDIFF(ss, @Start, @End))) + ' transaction/s'
GO
IF (SELECT OBJECT_ID('test')) IS NOT NULL DROP TABLE test
GO
PRINT '--------------------------------------------------------------------------'




May the Almighty God bless us all!
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-05-19 : 18:02:48
quote:
Originally posted by rockmoose

For the computed column I am concerned about the:
quote:
BOL: The connection on which the index is created, and all connections attempting INSERT, UPDATE, or DELETE statements that will change values in the index, must have six SET options set to ON and one option set to OFF. The optimizer ignores an index on a computed column for any SELECT statement executed by a connection that does not have these same option settings

I can set these in the sproc that does the search (and insert).
But what happens if a connection does INSERT, UPDATE, or DELETE without those options?
Nothing?, just the index not being used, or is there an error?

rockmoose



The insert or update will fail if the settings are not correct.

This code shows what happens.


Create a table to test with:

drop table [dbo].[MY_CHECKSUM_TEST]
GO
CREATE TABLE [dbo].[MY_CHECKSUM_TEST] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[MY_TEXT] [varchar] (60)
COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[MY_CHECKSUM] AS (checksum([MY_TEXT]))
)
GO
ALTER TABLE [dbo].[MY_CHECKSUM_TEST]
ADD CONSTRAINT [PK_MY_CHECKSUM_TEST]
PRIMARY KEY CLUSTERED ([ID])
GO

set ANSI_PADDING,
ANSI_WARNINGS,
CONCAT_NULL_YIELDS_NULL,
ARITHABORT,
QUOTED_IDENTIFIER,
ANSI_NULLS on
GO
set NUMERIC_ROUNDABORT off
GO
CREATE INDEX [IX_MY_CHECKSUM_TEST_MY_CHECKSUM]
ON [dbo].[MY_CHECKSUM_TEST]([MY_CHECKSUM])
GO
set arithabort OFF
GO
set numeric_roundabort OFF
GO
set quoted_identifier OFF
GO


Test inserts and updates:


set QUOTED_IDENTIFIER off
set arithabort off
go
print 'Bad Insert Settings'
go
insert into [dbo].[MY_CHECKSUM_TEST] ( [MY_TEXT] )
select [MY_TEXT] = 'aaa '+convert(varchar(400),newid())

go
set QUOTED_IDENTIFIER on
set arithabort on
go
print 'Good Insert Settings'
go
insert into [dbo].[MY_CHECKSUM_TEST] ( [MY_TEXT] )
select [MY_TEXT] = 'bbb '+convert(varchar(400),newid())

go
set QUOTED_IDENTIFIER off
set arithabort off
go
print 'Bad Update Settings'
go
update [dbo].[MY_CHECKSUM_TEST]
set [MY_TEXT] = 'xxxx '+ [MY_TEXT]
go
set QUOTED_IDENTIFIER on
set arithabort on
go
print 'Good Update Settings'
go
update [dbo].[MY_CHECKSUM_TEST]
set [MY_TEXT] = 'zzzz '+ [MY_TEXT]
go
select * from [dbo].[MY_CHECKSUM_TEST]
go
truncate table [dbo].[MY_CHECKSUM_TEST]


Results:


Bad Insert Settings
Server: Msg 1934, Level 16, State 1, Line 1
INSERT failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER, ARITHABORT'.
Good Insert Settings

(1 row(s) affected)

Bad Update Settings
Server: Msg 1934, Level 16, State 1, Line 1
UPDATE failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER, ARITHABORT'.
Good Update Settings

(1 row(s) affected)

ID MY_TEXT MY_CHECKSUM
----------- ------------------------------------------------------------ -----------
1 zzzz bbb B661FF6B-BA03-4042-B717-98F6E4496D70 -1287381225

(1 row(s) affected)







CODO ERGO SUM
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-05-19 : 19:00:19
Thank you for testing MVJ!

Now that pretty much rules out any indexed computed columns, because the last thing I want is funny errors creeping up somewhere.
So I will need to implement iof triggers, or just take my allergic to null medecine and use regular triggers.

This is great, learning something every day.

rockmoose
Go to Top of Page
    Next Page

- Advertisement -