Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 bulk insert vs openrowset

Author  Topic 

mambion
Starting Member

5 Posts

Posted - 2010-07-27 : 17:08:52
I intend to import csv files to SQL Server. Which would be a better option: bulk insert or openrowset. Would anyone be able to elaborate what are the disadvantages and advantages of each?

We have a current implementation wherein we use bulk insert to load files to SQL Server via temp tables before loading them to their actual tables. We do this because the actual tables have other fields that are system-defined rather than coming from the file. We intend to improve this to make the load faster but not sacrificing flexibility to import files of different formats. One way is using openrowset bulk to bypass loading to temp tables and having format files to define the fields. Isn't this the same as using bulk insert?

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-27 : 17:14:22
Yes, using a format file will do this. Bulk Insert is also faster, and bcp may be faster still

HOWEVER

I am all for loading Permanent Staging Tables, not temp tables, and performing data validation and auditing BEFORE loading the final table



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-07-27 : 17:15:20
You have to use the BULK option of OPENROWSET to achieve it, but yes it's similar/same to BULK INSERT.

OPENROWSET is blocked by default though, so it would need to enabled via sp_configure. It is blocked by Microsoft by default for security reasons.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 2010-07-27 : 18:29:38
Thanks for the response. Just to add more info, the actual tables I mentioned are actually staging tables, but with additional fields not found in the file. In loading to a temporary table, our current implementation has an SP that uses Dynamic SQL to first assemble the query string that would create the temp table by retrieving the columns of the actual table from syscolumns. This is one overhead that we accepted because we have the flexibility of calling the same SP to load data from different files of different format. Thus, for every new file of different format, we only need to create new actual tables and then use the same SP to populate from the file, to a temp table, before transferring to actual. Using a format file, though, would definitely eliminate using temp tables but would give admin team additional maintenance of managing new format files in addition to the actual tables. If we use OPENROWSET, I know we can achieve bypassing load to temp tables, but is it possible to include system-defined fields, such as current date for example, in the SELECT statement so we can populate the actual tables at one go? If we use BULK INSERT, how can this be implemented and will this option be faster?
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-07-27 : 19:37:36
Are your temporary tables of the # variety or are they regular tables? You can use BULK INSERT from within a stored procedure and don't need a format file. A format file just helps force the format. You can even call BULK INSERT via dynamic SQL so that you can reference your new table that was created.

BULK INSERT is the fastest method.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-07-28 : 00:53:59
"so we can populate the actual tables at one go?"

For me the benefit of staging tables is when something goes wrong.

Our import from external sources via staging tables to actual tables is pretty complex, but it assumes that, sooner or later, the data from the external source will be rubbish!

Just yesterday we had a failure because an external source provided duplicate data - two rows for the same primary key. Something that is, apparently!, absolutely impossible at the far end - but someone twiddled with something ...

With staging tables it was easy to report on what the duff data was, change the routine to make the import to "actual tables" ignore any duplicate rows, and log an error for the duplicates. And we've changed the template we use for such imports to do it that way round in future (like so many of these type of things I'm actually amazed it hasn't happened before)

We now have several of these "abstracted" processes in our template.

Instead of

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...
FROM MyStagingTable AS T
WHERE T.ImportStatus = 1 -- 1=OK to import if not already exists
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)

we now have

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...

UPDATE T
SET T.ImportStatus = 2

FROM MyStagingTable AS T
WHERE T.ImportStatus = 1
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)

UPDATE T
SET T.ImportStatus = 3,
T.ErrMsg = 'Duplicate PK during INSERT'
FROM MyStagingTable AS T
WHERE T.ImportStatus = 2
AND MyPK IN (SELECT MyPK FROM MyStagingTable GROUP BY MyPK HAVING COUNT(*) > 1)

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...
FROM MyStagingTable AS T
WHERE T.ImportStatus = 2
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)

Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 2010-07-28 : 17:19:37
Our temporary table is of the # variety, and these are created dynamically in tempdb. Thus, we use dynamic SQL as well. Current implementation already uses BULK INSERT to #temp tables with no format file. Afterwards, we use INSERT INTO StagingTable from #temp table with additional fields included in the SELECT query such as businessdate and inputfilename. Unfortunately, we find this rather slow considering the huge volume of data we have. We certainly would improve things if we do away with the #temp tables. Based on the discussions above, what I'm getting is that it seems like our options are to use OPENROWSET with format file or BULK INSERT with format file. Using OPENROWSET, performance would be less appealing as compared to BULK INSERT. If we use BULK INSERT, the additional fields we require cannot be included in the bulk load process and may have to be done as a separate transaction. We're thinking of having the format file created dynamically as well by querying the table schema from syscolumns. Are there other options we can use so we can improve the performance of our load process to staging tables? Will it also be better if we use SSIS instead to encapsulate the dynamic creation of the format file and loading to staging tables as a single process?

I appreciate everyone's feedback here. You've been very helpful.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-28 : 17:30:14
bcp the data into a permanet staging table....or didn't I say that already

faster still, put the staging tables into their own filegroup, preferably on a phyically separate drive



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 2010-08-01 : 17:28:08
I think I got what I need. Thank you so much for the feedback.
Go to Top of Page

xyvyx
Starting Member

6 Posts

Posted - 2011-12-19 : 12:18:32
So can somebody confirm that the OPENROWSET(BULK...) method vs. BULK INSERT is severely limited?
The MSDN documentation is rather unclear even though it lumps the two methods together. On the following MSDN "About Bulk Import and Bulk Export Operations" page, the details of both methods link to the same page:

http://msdn.microsoft.com/en-us/library/ms187042.aspx

The BULK INSERT statement supports a WITH clause with numerous options but the OPENROWSET(BULK...) doesn't seem to support any of these things w/o use of a format file. So the most basic features like FIELDTERMINATOR and ROWTERMINATOR aren't supported without the use of a format file... which really goes against the ad-hoc usage of OPENROWSET, eh?

There are numerous articles I've found that detail some "workarounds" to read in basic CSV data.... generally, the BULK INSERT statement could handle these just fine, but we often just need a quick way to read this data into a table without knowing or spending the time to define all the columns. The OPENROWSET method could do just this for tab-delimited data & it support a couple parameters, but why didn't it include any of the other format/layout options of it's cousin?

"If we can put it into a table, we can kill it"
-Ahnold Sequelnator
Go to Top of Page
   

- Advertisement -