Using Indexed Computed Columns to Improve Performance

By Bill Graziano on 16 January 2007 | Tags: Performance Tuning , Indexes


I recently read a blog post on doing case-insensitive text searches on SQL Server 2005. The post said that an index on a computed column might be used even if the computed column itself wasn't used in the WHERE clause. I was curious to test that and see how far I might take it. Years ago I worked on a case-sensitive application and I vividly remember all the headaches that caused me. I was also curious to see if I could use that for datetime columns to strip off the time portion and easily do a "date-only" search.

I started with a copy of a table from AdventureWorks.

USE [AdventureWorks]
GO
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES 
			WHERE TABLE_NAME = 'NewContact'
			AND TABLE_SCHEMA = 'Person')
	DROP TABLE [Person].[NewContact]
GO
CREATE TABLE [Person].[NewContact]  (
	[ContactID] [int] PRIMARY KEY NOT NULL,
	[Title] [nvarchar](8) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
	[FirstName] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,
	[LastName] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,
	[Suffix] [nvarchar](10) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
	[EmailAddress] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
	[ModifiedDate] [datetime] NOT NULL  DEFAULT (getdate())  )
GO
INSERT INTO [Person].[NewContact] ( [ContactID], [Title], [FirstName],
	[LastName], [Suffix], [EmailAddress], [ModifiedDate] )
select ContactID, Title, FirstName, LastName, Suffix, EmailAddress, 
	ModifiedDate = DATEADD(mi, ContactID, ModifiedDate)
FROM Person.Contact
GO
CREATE INDEX IX_NewContact_LastName ON Person.NewContact(LastName);
GO
CREATE INDEX IX_NewContact_ModifiedDate ON Person.NewContact(ModifiedDate) 
GO

This creates a copy of the contacts table with all the NVARCHAR columns set to case-sensitive. It also creates indexes on the LastName column and DateModified column. I tested a series of SELECT statements against the table to establish a baseline.

The SELECT statements and the results are:

Statement Base index Upper Index CI Index
1. SELECT * FROM Person.NewContact WHERE LastName = 'Mcanich' Index seek, query cost = 0.007, returned one row Index seek, query cost = 0.007, returned one row Index seek, query cost = 0.007, returned one row
2. SELECT * FROM Person.NewContact WHERE LastName = 'mcanich' Index seek, query cost = 0.007, returned zero rows Index seek, query cost = 0.007, returned zero rows Index seek, query cost = 0.007, returned zero rows
3. SELECT * FROM Person.NewContact WHERE LastName = 'MCANICH' COLLATE SQL_Latin1_General_CP1_CI_AI Index scan, query cost = .127, returned one row Index scan, query cost = 0.129, returned one row Index seek of CI index, query cost = 0.007, returned one row
4. SELECT * FROM Person.NewContact WHERE UPPER(LastName) = UPPER('McAnich') Index scan, query cost = 0.08, returned one row Index seek of "Upper" index, query cost = 0.007, returned one row Index seek of "Upper" index, query cost = 0.007, returned one row
5. SELECT * FROM Person.NewContact WHERE LastNameUpper = UPPER('McAnich') (Failed) Index seek of "Upper" index, query cost = 0.007, returned one row Index seek of "Upper" index, query cost = 0.007, returned one row
6. SELECT * FROM Person.NewContact WHERE LastNameCI = 'mcanICh' (Failed) (Failed) Index seek of CI index, query cost = 0.007, returned one row

The first set of tests is in the "Base Index" column. When I compared the column to a scalar value it did an index seek (1 and 2). Query #3 specified a case-insensitive search using the COLLATE clause and that resulted in an very slow index scan. Query #4 converted them both to upper case and then compared them. This also resulted in a query scan but not nearly as bad. Still this had a cost ten times higher than the base query. One of the things I discuss in my performance tuning presentations is that any time you wrap a function around an indexed column it probably won't use the index efficiently. This seems to confirm that.

Next I created a computed column with the UPPER function and then built an index on it.

ALTER TABLE Person.NewContact
  ADD LastNameUpper AS UPPER(LastName)
GO
CREATE INDEX IX_NewContact_LastNameUpper ON Person.NewContact(LastNameUpper)
GO
 

The results of this are in the "Upper Index" column above. In Query #5 when we use the computed column it does an index seek on the new index just like I'd hoped it would. The really interesting result is Query #4. That also uses the new index and does an index seek even though we aren't using the new computed column. What the blog post says and what seems to be happening is that SQL Server is checking for a computed column that matches the WHERE clause. It finds it, finds that it's indexed and uses the index. Pretty cool if you ask me!

I also tested this by creating a case-insensitve computed column and testing that.

ALTER TABLE Person.NewContact
  ADD LastNameCI AS LastName COLLATE SQL_Latin1_General_CP1_CI_AI
GO
CREATE INDEX IX_NewContact_LastNameCI ON Person.NewContact(LastNameCI)
GO   

The results of this test are in the "CI Index" column in the table above. This performed just like the other computed column. Query #6 which explicitly used the column performed very well. But so did Query #3 which didn't explicitly use the new column.

I also wanted to test this on datetime functions to see if I could easily query just on the date portion of the column. The queries I tested are:

Statement Base index Date Only Index
1. SELECT * FROM Person.NewContact WHERE ModifiedDate >= '5/1/2003' AND ModifiedDate < '5/2/2003' Index seek, query cost = 0.03, eleven rows returned Index seek, query cost = 0.03, eleven rows returned
2. SELECT * FROM Person.NewContact WHERE ModifiedDate = '5/1/2003' Index seek, query cost = 0.006, zero rows returned Index seek, query cost = 0.006, zero rows returned
3. SELECT * FROM Person.NewContact WHERE CONVERT(VARCHAR(10), ModifiedDate, 101) = '05/01/2003' Index scan, query cost = 0.07, eleven rows returned Index scan, query cost = 0.07, eleven rows returned
4. SELECT * FROM Person.NewContact WHERE CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101), 101) = '5/1/2003' Index scan, query cost = 0.07, eleven rows returned Index seek on new index, query cost = 0.04, eleven rows returned
5. SELECT * FROM Person.NewContact WHERE ModifiedDateOnly = '5/1/2003' (Failed) Index seek on new index, query cost = 0.04, eleven rows returned
6. SELECT * FROM Person.NewContact WHERE CAST(CONVERT(VARCHAR(10), ModifiedDate, 101) AS DATETIME) = '5/1/2003' Index scan, query cost = 0.07, eleven rows returned Index scan, query cost = 0.07, eleven rows returned
7. SELECT * FROM Person.NewContact WHERE CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101)) = '5/1/2003' Index scan, query cost = 0.07, eleven rows returned Index scan, query cost = 0.07, eleven rows returned

Notice that in the script that created the NewContact table I added some minutes to each ModifiedDate. I also created an index on ModifiedDate. The first query is the preferred way of selecting one days worth of data. It does an index seek and is the fastest way to return data. The second query is a little faster but doesn't return any data. The comparison of a date-only value to a date with time value with always fail. The third query is the way I see this type of query written most frequenty. The column is converted to VARCHAR and then compared to the value. In the fourth query the column is converted back to datetime.

Next I added a computed column and an index on that column.

ALTER TABLE Person.NewContact
	ADD ModifiedDateOnly AS CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101), 101) 
GO
CREATE INDEX XI_NewContact_ModifiedDateOnly ON Person.NewContact(ModifiedDateOnly)
GO

This created a computed column that had the time removed from ModifiedDate. When I converted back to DATETIME I had to specifiy the format number (101) or SQL Server complained that the function wasn't deterministic. Deterministic functions always return the same value from the same input value and database state. For example, GETDATE() isn't deterministic.

The biggest improvement came in Query #4. Since the function around ModifiedDate matched the computed column it used the computed column and its index. It didn't use it in Query #6 where the outer CONVERT was replaced with a CAST function. When test Query #4 but added a bunch of white space into the WHERE clause it still used the index. That tells it isn't doing a simple hash of the text in the WHERE clause. When I changed from VARCHAR(10) to any other length it no longer used the new index. Query #7 also didn't use the new index. The only difference in that query was the outer CONVERT function didn't have a format number.

If you have existing application code that wraps functions around an indexed column and then doesn't use the index you may be able to use this approach to improve those queries. If the functions are consistent you should be able to add computed columns and realize immediate benefits.


Related Articles

Which is Faster: SAN or Directly-Attached Storage? (21 January 2008)

Using Included Columns in SQL Server 2005 (14 January 2008)

SQL Server Indexes: The Basics (26 November 2007)

Benchmarking Disk I/O Performance: Size Matters! (30 October 2007)

What I Wish Developers Knew About SQL Server (Presentation) (11 October 2007)

Introduction to Parameterization in SQL Server (7 August 2007)

SQL Server Storage Engine Team Blog (7 June 2006)

Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 (1 June 2005)

Other Recent Forum Posts

SSRS Expression IIF Zero then ... Got #Error (11h)

Understanding 2 Left Joins in same query (1d)

Use a C# SQLReader to input an SQL hierarchyid (1d)

Translate into easier query/more understandable (1d)

Aggregation view with Min and Max (2d)

Data file is Compressed - Cannot Re-Attach, Cannot Restore (2d)

Sql trigger assign value to parameter (5d)

Need a sales $ for the past 90 days (5d)

- Advertisement -