Data Model Design For Query Performance

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Data Model Design For Query Performance

Author

Topic

bigsapper
Starting Member

8 Posts

Posted - 2008-04-22 : 09:08:21

I have an opportunity to rebuild a database model with the express purpose of improving query performance. So given the following I have a few questions.

Table A (~500M records)
Primary Key Field (int)
Field 1 (varchar)
Field 2 (varchar)
Field 3 (varchar)
Field 4 (varchar)
Field 5 (varchar)

Table B (1B+ records)
Primary Key Field (int)
Foreign Key Field (int)
Field 1 (varchar)
Field 2 (varchar)
Field 3 (varchar)
Field 4 (varchar)
Field 5 (varchar)

* Assumed: Tables are inner joined on all queries. The database is readonly.

-- Most of my lookups are based on querying Field 1 of Table A. The data content of Field 1, Table A is 90% unique.
1) Would it be more beneficial to put the clustered index on Field 1 instead of the PK field in Table A?
2) Can an Identity column be non-clustered?
3) Alternatively, would it be beneficial to build a separate lookup table with just the PK & Field 1 of Table A, with a clustered index on the lookup table Field 1 which I join on Table A? (did that make sense?)

-- I have a secondary lookup that performs queries on Fields 1, 2, 3, 4 & 5 of Table B
1) Would it be more beneficial to create an additional indexed lookup column of the concantenated values of Fields 1-5 of Table B versus a covering index of all 5 columns?
2) Does a clustered index have to be unique?
3) Would a clustered index be more beneficial over Fields 1-5 or the special lookup column versus the PK or FK fields?
4) Would creating a special lookup table with just the requisite fields be more beneficial?

An extra question. The existing data model uses the CHAR datatype for all columns less than 9 characters wide and the columns are set to allow nulls. This requires every select statement to COALESCE() and RTRIM() all these columns. I intend to make all (affected) columns VARCHAR, NOT NULL with a default value of a 0-length string.
Will this enhance query performance?

Thanks in advance for any insight.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-04-22 : 11:57:54

1. probably. you'll have to test this.
2. yes
3. not really. that would be basicaly the same as putting the clustered index on the identity PK and Field 1

1. don't think so. maybe have a checksum of the 5 and index that? depends on how large the fields are.
2. yes. if it's not, the sql serevr adds an internal uniqueifier that makes it unique. this doesn't change the acctuall data in the db.
3. no sure what you mean here. CI on all 5 would be great. you'll always have to query the whole 5 columns to use it though.
4. don't think so.

about those look up tables: maybe you should give us an example.

1. yes. using a function on the column doesn't use an index on that column.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

bigsapper
Starting Member

8 Posts

Posted - 2008-04-22 : 12:15:12

quote:
Originally posted by spirit1

1. probably. you'll have to test this. OK
2. yes OK
3. not really. that would be basicaly the same as putting the clustered index on the identity PK and Field 1 OK

1. don't think so. maybe have a checksum of the 5 and index that? depends on how large the fields are. Is there a threshold value I should be looking at?
2. yes. if it's not, the sql serevr adds an internal uniqueifier that makes it unique. this doesn't change the acctuall data in the db. So you're saying, I could put a clustered index on non-unique data and sql server will deal with the uniqueness itself? Is that a perf hit?
3. no sure what you mean here. CI on all 5 would be great. you'll always have to query the whole 5 columns to use it though. Answered in #1, thanks.
4. don't think so. OK

about those look up tables: maybe you should give us an example. see below

1. yes. using a function on the column doesn't use an index on that column.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

Consider Table A, People and Table B, Addresses. The lookup table would be a de-normalized build of the join of People & Addresses with the following columns...
(Table A Primary Key Field)
FirstName
LastName
HouseNumber
StreetName
PostalCode

Now consider a third table, AliasNames, child of People. The lookup table would now comprise all occurences of "original name" and associated adrresses as well as each "alias name" and associated addresses.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-04-22 : 12:36:43

1. the checksum was a suggestion to lower index size. if all you have are ints or small varchar don't bother. you might test this for fun though.
2. yes you can. no it's not a perf hit, since the uniqifier is only added to the duplicate rows.
3. well denormalized data is faster to look up... but there's no definite answer here. you'll have to test it. if you're doing singleton lookups on the CI the size of the tables doesn't really matter that much.

however do keep in mind that theory (which is what you're getting here since we don't have access to your system)
is great but you'll have to test your app thoroughly.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

bigsapper
Starting Member

8 Posts

Posted - 2008-04-22 : 13:41:32

Thanks. I appreciate the insight.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-22 : 14:45:18

Maybe what Spirit says is that when a table does not have a clustered index (a heap),
then nonclustered indexes use a row locator built from the

1) file identifier
2) page number
3) number of the row on the page

to point to the date, and it is known as row id (rid). This row id is not retrievable.
If the table has a clustered index, then nonclustered indexes use the key of the clustered index to locate the data.

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-22 : 14:51:47

Kimberly Tripp has an Powerpoint presentation here
http://00001001.ch/download/techdays2005/d11_slides.ppt

Fernando Guerrero has some insights here
http://blahblah.springnote.com/pages/370631/attachments/153485

E 12°55'05.25"
N 56°04'39.16"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-04-22 : 15:15:34

> Maybe what Spirit says is that when a table does not have a clustered index (a heap),
> then nonclustered indexes use a row locator built from the

for which point do you think i'm saying that?

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources