Author |
Topic |
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-12-05 : 14:18:58
|
Does someone out there already know which method would use the least amount of time?A nightly process will re-create several rollup tables each with a clustered index. Each table will be about 3 gigs and contain about 20 mil rows.--------------------------------------------CREATE a table with a clustered indexINSERT table (in order of clustered index?)CREATE a table without any indexINSERT table in order of clustered indexCREATE CLUSTERED INDEXSELECT INTO table CREATE CLUSTERED INDEXany other alternative I didn't mention--------------------------------------------will the "pre-ordering" help save time building the index?Be One with the OptimizerTG |
|
Kristen
Test
22859 Posts |
Posted - 2005-12-05 : 14:58:41
|
"will the "pre-ordering" help save time building the index?"There is a hint, to that effect, that you can use if you are BPC'ing in the data (Maybe BULK LOAD too, I'm not sure)Kristen |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-12-05 : 15:50:13
|
Thanks Kristen. Since the data is all originating in sql server I didn't think I'd use that approach. However, it turns out to be faster to bcp the data out, sort it, then bulk load back into a table. I'd condsider it.Be One with the OptimizerTG |
 |
|
X002548
Not Just a Number
15586 Posts |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-12-05 : 16:53:44
|
t-sql scripts called from a job step. Source data is raw tables in various databases (same server)Be One with the OptimizerTG |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-12-05 : 17:49:25
|
fyi:test results of the 3 methods mentioned above. I have yet to try BCPing out to a file, ordering, and then BCPing back into table with clustered index using the load hint "-h ORDER(<column>)"--CREATE a table without any index--INSERT table in order of clustered index--CREATE CLUSTERED INDEX--(19025675 row(s) affected)--854 seconds--==============================================--SELECT INTO table --CREATE CLUSTERED INDEX--(19025675 row(s) affected)--877 seconds--==============================================--CREATE a table with a clustered index--INSERT table (in order of clustered index?)--(19025675 row(s) affected)--1016 seconds Be One with the OptimizerTG |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-12-05 : 17:54:11
|
Most likely, it will be faster if you create the table with the clustered index (or clustered primary key constraint). If you need other indexes, create them after you are done loading the data. If possible, load the data in the clustered index order. Building the clusterd index after loading the table is usually far more overhead than having it in place before you load it.If you do it with TSQL, you may run into problems with your transaction log if you try to insert 20 million rows in one transaction. When I do something like this, I usually use a loop to load about 100,000 rows in each transaction, and load them in clustered index sequence.It may be more convenient to use BCP, with a batch size of say 100,000 so that it doesn't load in one huge transaction. BCP it out in native mode using a query to order the data in clustered index sequence, and then BCP it in.Another alternative would be to use a DTS data pump task with a batch size of about 100,000. This is often the fastest way, since you avoid the two step BCP out/BCP in. Again, make the input query to the data pump ordered in clustered index sequence.Of course, if you have time, you should test the alternatives to see which is actually the fastest in your situation.CODO ERGO SUM |
 |
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-12-06 : 08:54:47
|
quote: Originally posted by Michael Valentine Jones Building the clusterd index after loading the table is usually far more overhead than having it in place before you load it.CODO ERGO SUM
Really? I gotta test this....did you read this somewhere, or just personal experience?Brett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-12-06 : 09:20:01
|
That has been my experience. However, the times TG got on his tests showed this was slower. Maybe it depends on the structure of the table or indexes or some other factor. It's always good to test some alternatives in your own environment.quote: Originally posted by X002548
quote: Originally posted by Michael Valentine Jones Building the clusterd index after loading the table is usually far more overhead than having it in place before you load it.CODO ERGO SUM
Really? I gotta test this....did you read this somewhere, or just personal experience?Brett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam
CODO ERGO SUM |
 |
|
Kristen
Test
22859 Posts |
Posted - 2005-12-06 : 16:03:42
|
"did you read this somewhere, or just personal experience"Kinda makes sense to me ... but I'm just guessing.If the data is randomly arranged (or at least arranged NOT in the order of the proposed Clustered Index, shed-loads of data is going to have to be rearranged to get it ordered by the clustered index.Creating the table with clustered index, and then inserting the data (preferably ordered by clustered index) is going to present the data in the most ergonomic way for the clustered index to be built [with the minimum of page splits] isn't it?However, I'm figuring that you've got a tangential thought, Brett, and I'm "all ears" !Kristen |
 |
|
|