Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Taurkon
Starting Member
26 Posts |
Posted - 2007-03-27 : 14:30:16
|
I have a fairly large table that is loaded monthly with approx 150 million records at any given time, and has 13 columns. At the end of each month, the data is purged.The majority of the columns are ID's pointing to other tables. The table has 10 columns that make up the primary key, an amount column, and 2 date stamp columns.All of the ID's that make up the primary key are used to join to other tables, or are frequently included in SQL where clauses when using this table.My question is, should I be using a concatenated pk index (as I am currently), or use an identity key and have alternate indexes for the different ID's contained in the table? Other suggestions?Table Usage: This is a staging table before the data is moved into the warehouse. The data is denormalized before being transformed into the warehouse and multiple lookups are done against this table. Therefore, there are many writes to this table/month (150 million approx) and many reads against the data (transformation into warehouse ) Is there any other details required that I have failed to mention to answer my question?Appreciated, |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2007-03-27 : 14:46:25
|
How many other indexes are currently on the table?rockmoose |
 |
|
Taurkon
Starting Member
26 Posts |
Posted - 2007-03-27 : 14:55:19
|
At present, none. Because of the already large load times, we have not begun to optimize yet. This is why I am asking the questions now. |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2007-03-27 : 15:07:58
|
Sounds like probably a thin clustered (identity key) PK is the way to go.Then add additional indexes that cover the most common usage scenarios for the table,and include (cover) the additional data columns (amount + maybe datestamp columns) in those indexes, if those columns are used in the read queries.In SQL2000 you just include them in the index, in SQL2005 there is an INCLUDE clause for the CREATE INDEX.Also keep a UNIQUE index for the "natural key" of the table (the 10 column current PK).What is the usage of the datestamp columns?Are they used in the queries (for range selectioning) or just additional info/metadata?rockmoose |
 |
|
Taurkon
Starting Member
26 Posts |
Posted - 2007-03-27 : 15:45:39
|
The date stamps are not migrated to the warehouse and are rarely used.So, if I understand you correctly, the suggestion is:1. Create a clustered Identity PK.2. Create a unique natural key index (current pk) 3. Create seperate individual indexes for most common usage keys (4 or 5 of the ID's that consist of the current PK)Appreciate your input on this.One additional questions... I have to assume the insert performance on this will be large. I'll do some testing, but would it make sense to create some (or all) the indexes on the table AFTER it is loaded, since this is a monthly process? |
 |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2007-03-27 : 17:36:56
|
Yes I think it will make sense for creating the secondary indexes afterwards.If the table is essentially empty before the first load, then I would only keep the PK and the UC (natural unique constraint)The UC is enforcing the data integrity.If you build the secondary indexes after loading the table,the overall process will most likely be quicker,and the secondary indexes will be less fragmented.If the secondary indexes also cover the Amount column (I suppose you want to use that in the queries) then queries going against those indexes will benefit greatly.rockmoose |
 |
|
|
|
|
|
|