SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 bulk insert vs openrowset
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

mambion
Starting Member

5 Posts

Posted - 07/27/2010 :  17:08:52  Show Profile  Reply with Quote
I intend to import csv files to SQL Server. Which would be a better option: bulk insert or openrowset. Would anyone be able to elaborate what are the disadvantages and advantages of each?

We have a current implementation wherein we use bulk insert to load files to SQL Server via temp tables before loading them to their actual tables. We do this because the actual tables have other fields that are system-defined rather than coming from the file. We intend to improve this to make the load faster but not sacrificing flexibility to import files of different formats. One way is using openrowset bulk to bypass loading to temp tables and having format files to define the fields. Isn't this the same as using bulk insert?

X002548
Not Just a Number

15586 Posts

Posted - 07/27/2010 :  17:14:22  Show Profile  Reply with Quote
Yes, using a format file will do this. Bulk Insert is also faster, and bcp may be faster still

HOWEVER

I am all for loading Permanent Staging Tables, not temp tables, and performing data validation and auditing BEFORE loading the final table



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

tkizer
Almighty SQL Goddess

USA
36997 Posts

Posted - 07/27/2010 :  17:15:20  Show Profile  Visit tkizer's Homepage  Reply with Quote
You have to use the BULK option of OPENROWSET to achieve it, but yes it's similar/same to BULK INSERT.

OPENROWSET is blocked by default though, so it would need to enabled via sp_configure. It is blocked by Microsoft by default for security reasons.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 07/27/2010 :  18:29:38  Show Profile  Reply with Quote
Thanks for the response. Just to add more info, the actual tables I mentioned are actually staging tables, but with additional fields not found in the file. In loading to a temporary table, our current implementation has an SP that uses Dynamic SQL to first assemble the query string that would create the temp table by retrieving the columns of the actual table from syscolumns. This is one overhead that we accepted because we have the flexibility of calling the same SP to load data from different files of different format. Thus, for every new file of different format, we only need to create new actual tables and then use the same SP to populate from the file, to a temp table, before transferring to actual. Using a format file, though, would definitely eliminate using temp tables but would give admin team additional maintenance of managing new format files in addition to the actual tables. If we use OPENROWSET, I know we can achieve bypassing load to temp tables, but is it possible to include system-defined fields, such as current date for example, in the SELECT statement so we can populate the actual tables at one go? If we use BULK INSERT, how can this be implemented and will this option be faster?
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
36997 Posts

Posted - 07/27/2010 :  19:37:36  Show Profile  Visit tkizer's Homepage  Reply with Quote
Are your temporary tables of the # variety or are they regular tables? You can use BULK INSERT from within a stored procedure and don't need a format file. A format file just helps force the format. You can even call BULK INSERT via dynamic SQL so that you can reference your new table that was created.

BULK INSERT is the fastest method.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 07/28/2010 :  00:53:59  Show Profile  Reply with Quote
"so we can populate the actual tables at one go?"

For me the benefit of staging tables is when something goes wrong.

Our import from external sources via staging tables to actual tables is pretty complex, but it assumes that, sooner or later, the data from the external source will be rubbish!

Just yesterday we had a failure because an external source provided duplicate data - two rows for the same primary key. Something that is, apparently!, absolutely impossible at the far end - but someone twiddled with something ...

With staging tables it was easy to report on what the duff data was, change the routine to make the import to "actual tables" ignore any duplicate rows, and log an error for the duplicates. And we've changed the template we use for such imports to do it that way round in future (like so many of these type of things I'm actually amazed it hasn't happened before)

We now have several of these "abstracted" processes in our template.

Instead of

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...
FROM MyStagingTable AS T
WHERE T.ImportStatus = 1 -- 1=OK to import if not already exists
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)

we now have

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...
UPDATE T
SET T.ImportStatus = 2
FROM MyStagingTable AS T
WHERE T.ImportStatus = 1
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)

UPDATE T
SET T.ImportStatus = 3,
    T.ErrMsg = 'Duplicate PK during INSERT'
FROM MyStagingTable AS T
WHERE T.ImportStatus = 2
AND MyPK IN (SELECT MyPK  FROM MyStagingTable GROUP BY MyPK HAVING COUNT(*) > 1)

INSERT INTO MyActualTable(Col1, Col2, ...)
SELECT Col1, Col2, ...
FROM MyStagingTable AS T
WHERE T.ImportStatus = 2
AND NOT EXISTS (SELECT * FROM MyActualTable AS A WHERE A.MyPK = T.MyPK)


Edited by - Kristen on 07/28/2010 01:07:43
Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 07/28/2010 :  17:19:37  Show Profile  Reply with Quote
Our temporary table is of the # variety, and these are created dynamically in tempdb. Thus, we use dynamic SQL as well. Current implementation already uses BULK INSERT to #temp tables with no format file. Afterwards, we use INSERT INTO StagingTable from #temp table with additional fields included in the SELECT query such as businessdate and inputfilename. Unfortunately, we find this rather slow considering the huge volume of data we have. We certainly would improve things if we do away with the #temp tables. Based on the discussions above, what I'm getting is that it seems like our options are to use OPENROWSET with format file or BULK INSERT with format file. Using OPENROWSET, performance would be less appealing as compared to BULK INSERT. If we use BULK INSERT, the additional fields we require cannot be included in the bulk load process and may have to be done as a separate transaction. We're thinking of having the format file created dynamically as well by querying the table schema from syscolumns. Are there other options we can use so we can improve the performance of our load process to staging tables? Will it also be better if we use SSIS instead to encapsulate the dynamic creation of the format file and loading to staging tables as a single process?

I appreciate everyone's feedback here. You've been very helpful.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 07/28/2010 :  17:30:14  Show Profile  Reply with Quote
bcp the data into a permanet staging table....or didn't I say that already

faster still, put the staging tables into their own filegroup, preferably on a phyically separate drive



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

mambion
Starting Member

5 Posts

Posted - 08/01/2010 :  17:28:08  Show Profile  Reply with Quote
I think I got what I need. Thank you so much for the feedback.
Go to Top of Page

xyvyx
Starting Member

USA
6 Posts

Posted - 12/19/2011 :  12:18:32  Show Profile  Reply with Quote
So can somebody confirm that the OPENROWSET(BULK...) method vs. BULK INSERT is severely limited?
The MSDN documentation is rather unclear even though it lumps the two methods together. On the following MSDN "About Bulk Import and Bulk Export Operations" page, the details of both methods link to the same page:

http://msdn.microsoft.com/en-us/library/ms187042.aspx

The BULK INSERT statement supports a WITH clause with numerous options but the OPENROWSET(BULK...) doesn't seem to support any of these things w/o use of a format file. So the most basic features like FIELDTERMINATOR and ROWTERMINATOR aren't supported without the use of a format file... which really goes against the ad-hoc usage of OPENROWSET, eh?

There are numerous articles I've found that detail some "workarounds" to read in basic CSV data.... generally, the BULK INSERT statement could handle these just fine, but we often just need a quick way to read this data into a table without knowing or spending the time to define all the columns. The OPENROWSET method could do just this for tab-delimited data & it support a couple parameters, but why didn't it include any of the other format/layout options of it's cousin?

"If we can put it into a table, we can kill it"
-Ahnold Sequelnator
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.08 seconds. Powered By: Snitz Forums 2000