Insert/Update from text source

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Insert/Update from text source

Author

Topic

sqlserver_newbee
Starting Member

11 Posts

Posted - 2007-07-19 : 23:39:36

Hi All,

We get several data files(flat files) in excess of 3-4 GB each month having records above 5 million. I am working on this project of transferring these data files to SQL Server database.

I have designed the database and established the constraints on tables etc. I have also created a DTS Package with Transform Data Task for reading these fixed length files and loading them into the tables. It all works fine until here.

Now each month we get new files with new records + updates to the old data. Because of the primary key constraints I can't obviously do a simple insert/append into the tables. I assume that maybe I need to load the new data into temp tables and then use t-sql queries to check the records in the actual table. If there is a duplicate primary key then update that row of actual table else insert the new record from temp table. Because of the size of the files it takes about 3 hours to insert data from text file to one table using Transform Data Task. This additional check + table transfers may take much more time and I have 20 such tables.

Can someone please suggest me of a suitable way of doing this operation? I am looking for a solution that can be time-efficient and simpler to implement as I am a beginner.

Any help is appreciated. Thanks a lot!

pootle_flump

1064 Posts

Posted - 2007-07-20 : 03:56:05

Personally I don't like DTS - I find it unweildy and inflexible. I prefer to use T-SQL syntax to import data using BULK INSERT (this is just a wrapper for BCP - Bulk Copy Program). You can look both these up in BoL for detailed explanations. They are very fast methods of getting data into the database, expecially if you can define a clustered index that matches the order of records in the file. Worth an experiment with one file to see if it is quicker eh?

I load into staging tables with no constraints (or maybe a PK if I trust the data source). The data is then cleansed and upserted (insert new records, update existing ones) into the database proper. It sounds, however, like you are working through these tables one record at a time, in a loop or cursor. Is this correct? If so you will get an immense performance gain by doing this in a set based manner.

sqlserver_newbee
Starting Member

11 Posts

Posted - 2007-07-20 : 10:18:12

Can you tell me where can I find BoL, so that I can look it up. I am aware of the Bulk Insert T-SQL Query and I am going to try that out to check out the time efficency.

Yes I am doing one record at a time. Can you tell me how to achieve this in a "set based manner"? If there is any article, please direct me to it as well.

Thanks a lot!

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-07-20 : 11:09:35

Bol is books online, installed on sql server and client machine by default.

pootle_flump

1064 Posts

Posted - 2007-07-20 : 17:22:46

quote:
Originally posted by sqlserver_newbee

Yes I am doing one record at a time. Can you tell me how to achieve this in a "set based manner"? If there is any article, please direct me to it as well.

Ok - I presume you are not totally familiar with SQL and perhaps have a background in procedural programming? See how you get on with this tutorial. The short of it SQL deals in sets. As such, rather than performing the action n times to n rows you instead perform the action once to a set of rows. One of the MVPs here likens it to putting sugar in your tea either a grain at a time with tweezers or a lump at a time.

See how you get on. If you already know all this please let me know and we'll move on from there.
http://www.w3schools.com/sql/default.asp

sqlserver_newbee
Starting Member

11 Posts

Posted - 2007-07-23 : 11:52:53

Hey Pootle_flump.. Thanks a lot for your help. I have figured out the solution to my problem and I am half way done putting my data in the tables :)

I just performed 4 very simple queries which did my job -

1. Put data in staging tables with just primary key constraints.

2. Update Query - (Inner join got me all the common values and updated those in actual table)
Update Actual_Table
Set ()
From Actual_Table
Inner Join Staging_Table
ON
Actual_Table.pk1=Staging_Table.pk1

3. Delete results of inner join from staging table and just leaving the unique rows for insert.
Delete Staging_Table
From Staging_Table
Inner Join Actual_Table
ON
Actual_Table.pk1=Staging_Table.pk1


4. Insert all unique rows -
Insert INTO Actual_Table ()
Select ()
From Staging_Table


What are your views on this solution? Would you suggest me to look for a more optimal solution?

Once again, thanks for the help!

pootle_flump

1064 Posts

Posted - 2007-07-23 : 15:03:47

That's a good job

Typically you would combine 3 & 4:

INSERT INTO dbo.actual_table ()
SELECT ()
FROM dbo.staging_table
WHERE NOT EXISTS(SELECT NULL FROM dbo.actual_table WHERE dbo.actual_table.pk1 = dbo.staging_table.pk1)

This just inserts records from the staging table that do not yet exist in the actual table.

X002548
Not Just a Number

15586 Posts

Posted - 2007-07-23 : 15:21:11

I would do this

http://weblogs.sqlteam.com/brettk/archive/2004/04/23/1281.aspx

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

sqlserver_newbee
Starting Member

11 Posts

Posted - 2007-07-23 : 16:04:25

Yeah I could combine 3-4 Pootle_flump. That might reduce some considerable execution time for me given the size of data files in my case.

Thanks for the blog link, x002548. It was excellent information on what I am trying to achieve.

Thanks you very much guys! I will post back if i run into problems.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources