Using BULK INSERT to Load a Text File
By Garth Wells
on 19 March 2001
| 63 Comments
| Tags: bcp/Bulk Insert
This example combines dynamic SQL, BULK INSERT and the proper handling of double-quotes to solve a client's problem with loading various text file formats into a database. (This article has been updated through SQL Server 2005.)
One of my clients contacted
me recently and said they needed some help creating a stored
procedure that imported data from a text file. They wanted
the procedure to accept three parameters: PathFileName,
OrderID, and FileType. The PathFileName is simply the name
and physical location of the source file on the hard drive,
the OrderID is generated in the program that calls the procedure
and the FileType indicates the format of the data in the
source file. The two possible formats for the source data
are shown here:
FileType=1 (TxtFile1.txt)
"Kelly","Reynold","kelly@reynold.com"
"John","Smith","bill@smith.com"
"Sara","Parker","sara@parker.com"
FileType=2 (TxtFile2.txt)
Kelly,Reynold,kelly@reynold.com
John,Smith,bill@smith.com
Sara,Parker,sara@parker.com
BULK INSERT
I decided to use BULK INSERT to implement the solution.
The BULK INSERT statement was introduced in SQL Server 7
and allows you to interact with bcp (bulk copy program)
via a script. In pre-7 versions the only way you could access
bcp functionality was from a command prompt. I am not going
to list the full syntax of BULK INSERT here (but you can
find it
here), because it is a little long and most of it does
not apply to the problem I am solving. Instead, I will show
the valid BULK INSERT statements used to load the data shown
above.
BULK INSERT TmpStList FROM 'c:\TxtFile1.txt' WITH (FIELDTERMINATOR = '","')
TmpStList is the target table and TxtFile1.txt is the
source data file. The source file is located in the root
of the C drive. The FIELDTERMINATOR argument allows you
to specify the delimeter used to discern column values.
The valid statement for FileType=2 is shown here:
BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')
The only difference is the value of the FIELDTERMINATOR
argument.
The Solution
The stored procedure used to implement the solution is
fairly straight forward once you master the BULK INSERT statement.
The only real trick is loading the data that comes to you
in FileType=1 format. Because a double-quote starts and
ends a data row, it too is loaded in the table. The FIELDTERMINATOR
works between columns, not at the beginning or
end of a row. To workaround this I simply load the data
into a temporary table and then use a CASE statement and
the SUBSTRING and DATALENGTH functions to load the correct
data in the final table. The FileType=2 data will load as-is,
but I still put in the temporary table for consistency (easier
programming).
The SQL statements that create the temporary and final
table are shown here.
CREATE TABLE StudentList
(
StID int IDENTITY NOT NULL,
StFName varchar(50) NOT NULL,
StLName varchar(50) NOT NULL,
StEmail varchar(100) NOT NULL,
OrderID int NOT NULL
)
go
CREATE TABLE TmpStList
(
stFName varchar (50) NOT NULL,
stLName varchar (50) NOT NULL,
stEmail varchar (100) NOT NULL
)
go
The procedure used to implement the data loading is shown
here.
SET QUOTED_IDENTIFIER OFF
go
CREATE PROCEDURE ps_StudentList_Import
@PathFileName varchar(100),
@OrderID integer,
@FileType tinyint
AS
--Step 1: Build Valid BULK INSERT Statement
DECLARE @SQL varchar(2000)
IF @FileType = 1
BEGIN
-- Valid format: "John","Smith","john@smith.com"
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
END
ELSE
BEGIN
-- Valid format: John,Smith,john@smith.com
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = ',') "
END
--Step 2: Execute BULK INSERT statement
EXEC (@SQL)
--Step 3: INSERT data into final table
INSERT StudentList (StFName,StLName,StEmail,OrderID)
SELECT CASE WHEN @FileType = 1 THEN SUBSTRING(StFName,2,DATALENGTH(StFName)-1)
ELSE StFName
END,
SUBSTRING(StLName,1,DATALENGTH(StLName)-0),
CASE WHEN @FileType = 1 THEN SUBSTRING(StEmail,1,DATALENGTH(StEmail)-1)
ELSE StEmail
END,
@OrderID
FROM tmpStList
--Step 4: Empty temporary table
TRUNCATE TABLE TmpStList
go
The first thing you need to know is that SET QUOTED_IDENTIFIER
is set to OFF because double-quotes are used to set the
value of a variable. Dynamic SQL is used to create the BULK
INSERT statement on-the-fly, and double-quotes are required
to do this. The final BULK INSERT statement is
a function of both the @PathFileName and @FileType parameters.
Once built, it is executed with the EXEC() statement and
the source data is loaded into the temporary table.
Once the data is in TmpStList, the next step is to load
it into the final table. I use the CASE statement to determine
the value in the @FileType parameter and manipulate accordingly.
When @FileType=1, the SUBSTRING and DATALENGTH functions
are used to remove the double-quotes from the StFName and
StEmail columns. When FileType=2, the data is loaded as
is and no manipulation is required.
After the data is loaded I empty the temporary table
with the TRUNCATE TABLE statement. I could have used DELETE
to accomplish this, but TRUNCATE TABLE has less of an impact
on the transaction log.
The following shows the way to call the procedure specifying
a different FileType value for each call.
EXEC ps_StudentList_Import 'c:\TxtFile1.txt',1, 1
EXEC ps_StudentList_Import 'c:\TxtFile2.txt',1, 2