BCP or DTS? BULK INSERT? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Import/Export (DTS) and Replication (2000)

BCP or DTS? BULK INSERT?

Author

Topic

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-21 : 05:09:35

Hi,

the company I work for have 5 different seriously old COBOL-systems running on unix-machines that handles our primary business. Every evening/night most of the tables (files) in these databases are dumped over to SQL Server 2000 using DTS which have been working fine for quite some time and it still is. This is for reporting purposes. However; the old systems don't have any way of tracking changes so what we do is transfer *all* the data in each file every single night. As you might imagine this takes quite a bit of time and I'm now looking for ways to improve performance.

Right now we are running several DTS-packages with SQL-commands via some sort of API using ODBC. I don't know for sure but I suspect that there is quite a bit of overhead doing this, both in the DTS itself and in the API, and I have been thinking about using BCP or BULK INSERT instead.

Do you have any general recomendations? Is it possible to foresee (is that a word?) how big the performance difference would be or if I would see any difference at all? There are no transformations in the DTS. It will take quite a bit of work to make a sensible test here so I'm trying to consider my options in advance...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2006-06-21 : 06:53:06

What sort of data volumes are you talking about? I've experienced very significant improvements in the past by replacing DTS Datapumps with BCP / BULK INSERT (I've seen 3-6X faster quoted, but I can't substantiate that I'm afraid). If you are using one-to-one transformations, a low-impact quick fix might be to replace these with many-to-many (should be no problem as you have no transformations). Since each mapping is a separate transformation that must separately invoke the script engine, doing so should improve performance.

Mark

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-21 : 07:49:02

Each of the 5 systems result in a sql server database of roughly 5GB and the DTS-packages takes about 5-6 hours to finish for each system. I find this to be quite slow and I would like each of them to get under an hour which in my mind shouldn't be impossible.

I'm not to comfortable with DTS...what are these many-to-many transformations?

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2006-06-21 : 08:50:34

That's pretty big! I'd say you're guaranteed a performance boost by going to BULK INSERT/BCP. When you edit the DTS datapump task, do you see a separate line going from each source column to each destination column, or do the lines converge on a central point before diverging to join each destination column? If it's the former, you should be able to improve performance a little by selecting all the transformations, deleting them, selecting all your source and destination columns, then re-adding the transformations.

Mark

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-21 : 09:09:32

I'm also pretty sure that we'll be able to see a performance boost but it's always nice to have second opinions

In regards to the DTS there is always one line mapping each source column to the destination column...let me see if I got you right

What I have now:
SourceCol1  ---  DestCol1
SourceCol2  ---  DestCol2
SourceCol3  ---  DestCol3

What I should have:
SourceCol1  \     /  DestCol1
SourceCol2  - --- -  DestCol2
SourceCol3  /     \  DestCol3

Is this what you're saying?

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-06-21 : 10:32:22

It's going to depend a lot on what you're inserting into, but certainly in the ideal situation, 5GB/hr should be no problem at all.

I'm inserting about 90 files totalling 12GB using bcp into an empty database in about 70 minutes. That's with the files on one machine and the database server on a different machine, running bcp on the machine with the files. I can't use BULK INSERT from the database server because my account on that server doesn't have the necessary permissions -- which may certainly be a consideration for whether you can use BULK INSERT.

Doing the same insert on my home box with BULK INSERT, and the files and database on different local disks takes about 30 minutes (although that's SQL Server 2005 rather than 2000). Trying just one file, 52 million rows, 925MB (so rather narrow rows), the BULK INSERT was somewhat faster than bcp: 3 minutes versus 4 minutes 40 seconds. For the record, my best result with SSIS (not using the Bulk Insert Task) was 4:20.

For each of these, the tables conform to the requirements for minimally logged bulk copy. Specifically, the database is using simple recovery model, the tables have no triggers, no foreign keys, one clustered primary key, no other indexes, no rows (at the start!), and TABLOCK is being used.
In addition, the datafiles are ordered in clustered index order and the ORDER hint is specified. If you bulk insert data that is not in clustered key order, you'll find that what actually happens is that the data gets inserted into tempdb and then sorted before being finally inserted into the destination table.

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-22 : 03:06:08

This is good news Arnold! At least it seems to be worthwile testing this...maybe I should ask my boss to cut me a deal; if I can have all of them done in less than 2 hours each I'll get a raise

The requirements are mostly in my favor also. The sql server is for reporting purposes only which means no inserts/updates, recovery model is simple, and there are no triggers, foreign keys or any of that RDBS stuff. Server is 2000 standard but since I'm the "only" db-person I think I'll managage to have the proper permissions to do what I need. Hopefully I'll be able to get the files in the clustered index order also so it all seems good.

What we'll probably try to do is to create the files on the unix-boxes, ftp them over to the db-server, truncate each table in the database, drop indexes, BULK INSERT the files specifying TABLOCK, recreate indexes. And I'm thinking about doing all this inside a stored procedure...not to fond of those DTS-packages. How does that sound?

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-06-22 : 03:55:02

Sounds good to me!

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-06-22 : 05:15:47

I would consider loading the tables into a staging area.
Then incrementally load the dw from the staging area.
That would increase the availability, indexes can be kept and the dw-model can be adjusted if need be.

And definitely bulk insert is the method I would use.
We used to have DTS for many text files, but have moved away from that method.

rockmoose

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-22 : 06:17:08

rock; what do you mean by "staging area"? And by dw I assume you mean datawarehouse but it's not actually a datawarehouse, just basically a replica of the cobol-system on a sql-server. No logic/transformation is done whatsoever on the data, just a regular dump into the sql-server.

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-06-22 : 08:02:43

I just meant that you first dump the data into an area that is a replica of the cobol system.
The "real" reporting area keeps another replica (with proper indexes and keys etc..).

The load process would go. (roughly)
a) truncate coboltable replica in staging area
b) bulk insert the cobol table into the staging area
c) incrementally load the corresponding reporting table in the reporting area from the staging area
d) update statistics for the reporting table, if need be
e) truncate coboltable replica in staging area to save space..
f) repeat a-e for next cobol table

The point is that the reporting area would be available, even during the load.
If the load for some reason fails, you don't end up with a truncated table in the reporting area.

rockmoose

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-22 : 08:38:58

Hm, you might have a point here rock. Even though there is no reporting activity outside regular business hours (0700-1800) it might be a good idea to keep the downtime to an absolute minimum. Right now everything is completely unavailable for about 5.5 hours after 7 o'clock in the evening and if I could cut downtime to only like half an hour or something for each system I could claim an even bigger raise

I'll have to give this some thought...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2006-06-27 : 09:10:27

Just an update on how things went: I have struggled quite a bit with the bulk insert to get it to work in my favor, and the format file was basically a mess. So I decided to Keep It Simple and Stupid and bulk insert the entire file in its original format to a staging database and use SUBSTRING instead. Which worked great! Bulk insert and substring-stuff took exactly 2 minutes for a 220MB file with 1.9mill records and I'm a happy camper

2 tips it took me *quite* some time to figure out:
- always transfer txt-based datafiles in ascii mode when using ftp (I knew this but the guy that did the transfer failed to tell me!)
- when creating a bcp format file always put a Crlf after the last character in the file (!!!)

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-06-27 : 11:52:48

Great!, sounds fast enough

Thanks for the update.

rockmoose

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources