Indexing a large table advice - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Indexing a large table advice

Author

Topic

Taurkon
Starting Member

26 Posts

Posted - 2007-03-27 : 14:30:16

I have a fairly large table that is loaded monthly with approx 150 million records at any given time, and has 13 columns. At the end of each month, the data is purged.

The majority of the columns are ID's pointing to other tables. The table has 10 columns that make up the primary key, an amount column, and 2 date stamp columns.

All of the ID's that make up the primary key are used to join to other tables, or are frequently included in SQL where clauses when using this table.

My question is, should I be using a concatenated pk index (as I am currently), or use an identity key and have alternate indexes for the different ID's contained in the table? Other suggestions?

Table Usage: This is a staging table before the data is moved into the warehouse. The data is denormalized before being transformed into the warehouse and multiple lookups are done against this table. Therefore, there are many writes to this table/month (150 million approx) and many reads against the data (transformation into warehouse )

Is there any other details required that I have failed to mention to answer my question?

Appreciated,

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-03-27 : 14:46:25

How many other indexes are currently on the table?

rockmoose

Taurkon
Starting Member

26 Posts

Posted - 2007-03-27 : 14:55:19

At present, none. Because of the already large load times, we have not begun to optimize yet. This is why I am asking the questions now.

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-03-27 : 15:07:58

Sounds like probably a thin clustered (identity key) PK is the way to go.
Then add additional indexes that cover the most common usage scenarios for the table,
and include (cover) the additional data columns (amount + maybe datestamp columns) in those indexes, if those columns are used in the read queries.

In SQL2000 you just include them in the index, in SQL2005 there is an INCLUDE clause for the CREATE INDEX.

Also keep a UNIQUE index for the "natural key" of the table (the 10 column current PK).

What is the usage of the datestamp columns?
Are they used in the queries (for range selectioning) or just additional info/metadata?

rockmoose

Taurkon
Starting Member

26 Posts

Posted - 2007-03-27 : 15:45:39

The date stamps are not migrated to the warehouse and are rarely used.

So, if I understand you correctly, the suggestion is:

1. Create a clustered Identity PK.
2. Create a unique natural key index (current pk)
3. Create seperate individual indexes for most common usage keys (4 or 5 of the ID's that consist of the current PK)

Appreciate your input on this.

One additional questions... I have to assume the insert performance on this will be large. I'll do some testing, but would it make sense to create some (or all) the indexes on the table AFTER it is loaded, since this is a monthly process?

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-03-27 : 17:36:56

Yes I think it will make sense for creating the secondary indexes afterwards.
If the table is essentially empty before the first load, then I would only keep the PK and the UC (natural unique constraint)
The UC is enforcing the data integrity.

If you build the secondary indexes after loading the table,
the overall process will most likely be quicker,
and the secondary indexes will be less fragmented.

If the secondary indexes also cover the Amount column (I suppose you want to use that in the queries) then queries going against those indexes will benefit greatly.

rockmoose

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources