Return to Using DTS to Automate a Data Import Process
Using DTS to Automate a Data Import Process
Written by Garth Wells on 11 August 2002
In this article I show you how to create a DTS Package that can be used to automate a file download/import process. It's a little longer than the other articles I have written for SQLTeam and will require some work on your part if you want to implement it, but I think you will find it interesting.
I recently re-wrote a database application that gets daily data feeds via text
files that are uploaded to an ftp site. My client's customers generate the text
files from their mainframe systems and upload them to the ftp site for access.
The files are:
- Downloaded from the ftp site
- Loaded into the database
- Renamed, and
- Copied to a new location so they can be archived
Automating the Process with DTS
I do not like data loading processes that are dependent on human intervention,
so during the re-write I wanted to replace this process with one that could be
scheduled and run automatically. I remembered that DTS (Data Transformation
Services) has a task that supports ftp, so I decided to investigate this
approach. It could not have been easier to implement.
DTS: A Brief Introduction
If you have never worked with DTS, you have missed using one of the most useful
tools that come with SQL Server. DTS makes it so easy to get data in/out of SQL
Server, that it almost makes up for all the years developers had to put up with
the limited functionality of bcp (bulk copy program). If you have never used
bcp, consider yourself lucky.
The main purpose of DTS is to allow you to define connections to various data
sources and destinations and pump the data from one to the other. For example,
you can define a text file as a data source, an instance of SQL Server as the
destination, and then pump the data from one to the other. DTS supports numerous
data sources, all of which can be seen by opening a new package and clicking the
Connection option. To open a new package, expand the Server tree of the target
SQL Server and click the Data Transformation Services folder. You can now
right-click on Local Packages and select New Package. At this point you define
the connections and add any other tasks that need to be completed during the
transfer process.
DTS allows you to include several other tasks that can be executed before and
after the data transfer process. For example, the Process task will execute an
.exe or .bat file, and the ActiveX Script Task allows you to execute either
JScript or VBScript. And of course there is the File Transfer Protocol Task,
which is the main focus of this article. The other tasks supported can be seen
by clicking Task on the menu.
Creating the Pieces
I needed a DTS package that would do the following:
- Connect to an ftp site and download a specific file
- Transfer the data from the file into a SQL Server table
- Rename and move the file to an archive directory
- Scrub/load the data into the destination table
Before we can create a package that does this, we need some source data, two
destination tables, and a stored procedure. We will use a fixed-width text file
that conforms to the following layout for the data source. You can use your
favorite text editor to create the file.
Column Width
Name 40
Phone 10
My sample data looks like this:
Kelly Smith 8885551212
Paul Jones 8885551213
Mike Andrews 8885551214
Creating the tables is the next step. You may be wondering why I want to create
two destination tables. The first one is a temporary destination that holds the
data in the exact same format as the text file, and the second one is the data's
final destination--how it appears after being scrubbed. The tables can be
created with the following DDL.
CREATE TABLE ContactsImport
(
CI_Name char(40) NOT NULL,
CI_Phone char(10) NOT NULL,
CI_Processed char(1) DEFAULT 'N'
)
go
CREATE TABLE Contacts
(
CN_ID int IDENTITY NOT NULL PRIMARY KEY,
CN_FName varchar(20) NOT NULL,
CN_LName varchar(20) NOT NULL,
CN_Phone varchar(10) NOT NULL
)
I learned to use the two table approach during my first experience with DTS in
SQL Server 7. You can actually scrub the data while it is in transit using
VBScript or JScript, but that slowed down the import significantly when large
files were processed. The faster approach (in my experience) is to import the
data as-is, and process it with a T-SQL inside the database. Of course, when
large files are processed you do need to make sure that you have the hard drive
space to accommodate the extra copy of the data.
The stored procedure used to rename/move the source file and scrub the data is
shown here.
CREATE PROCEDURE pr_ContactsImport
AS
-- Part I: Move/rename source file
DECLARE @fn char(6),
@cmd varchar(100)
SET @fn=REPLACE(convert(char(8),getdate(),1),'/','')
SET @cmd ='Move c:\articles\contacts.txt c:\articles\archive\'+@fn+'.txt'
EXEC master..xp_cmdshell @cmd
-- End of Part I: Move/rename source file
-- Part II: Scrub the data
BEGIN TRANSACTION
INSERT Contacts (CN_FName,CN_LName,CN_Phone)
SELECT SUBSTRING(CI_Name,1,CHARINDEX(CHAR(32),CI_Name)-1),
SUBSTRING(CI_Name,CHARINDEX(CHAR(32),CI_Name)+1,DATALENGTH(CI_Name)),
CI_Phone
FROM ContactsImport
WHERE CI_Processed='N'
IF @@Error <> 0
GOTO ErrorHandler
UPDATE ContactsImport SET CI_Processed='Y'
IF @@Error <> 0
GOTO ErrorHandler
COMMIT TRANSACTION
ErrorHandler:
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION
RETURN
-- End of Part II: Scrub the data
go
To facilitate the explanation I have separated the code into two Parts. The code
in Part I moves/renames the file using xp_cmdshell. xp_cmdshell is an extended
stored procedure that is used to execute an O/S command. I use the CONVERT and
REPLACE functions to create the new file name that conforms to a mmddyy.txt
convention and then execute the Move command. The source directory is
C:\Articles and the archiving directory is: C:\Articles\Archive.
NOTE: A user must be granted Execute permission on xp_cmdshell unless they are a
member of the Sysadmin fixed server role.
In Part II I scrub the data. The contact name is stored in one column in the
data source (text file), but I want to store it in two in the Contacts table.
The SUBSTRING and CHARINDEX functions allow me to find the space between the
first and last name and parse accordingly.
For my testing I executed the above DDL in tempdb.
Creating the Package
Now that we have all the pieces in place we can create the package. Start a new
package using the instructions in the DTS: A Brief Introduction section, and
complete the following:
- Click Task on the menu and select File Transfer Protocol Task
- Make sure Internet Site is selected as the source and then specify the
connection information for the ftp site
- Input the Destination directory
- Click the Files tab and a list of files on the site will be displayed
- Double-click the source file and click OK
At this point you can test the task by right-clicking on it and selecting
Execute Step. The file will be moved to the destination directory.
The next step is to create the Transform Data Task that transfers the data from
the text file into the ContactsImport table. Before you can create this task,
you must define the source/destination connections. Create the source connection
by completing the following:
- Click Connection on the menu and select Text File (Source).
- Specify the File name (and location) and click OK.
- In the Text File Properties dialog ensure that the Fixed Field is selected
and click Next.
- If the data is aligned per the file format specified above, the column marker
(black line with arrow) will automatically be placed at the 40 position and the
end-of-file marker (red line) will be at 50. Click Finish and OK to complete the
connection definition.
Now we need to create the connection for the destination. Complete the following
to do this.
- Click Connection on the menu and select Microsoft OLE DB Provider for SQL
Server
- Specify the server name and connection info and ensure tempdb (or what ever
database you are using) is selected as the target database
The source/destination connections are defined, so let's create the Transform
Data Task with the following:
- Click Tasks on the menu and select Transform Data Task.
- Click the source connection and then the destination connection. An arrow is
created that links the two connections.
- Double-click the arrow and select the Destination tab. On my server, the
Contacts table is selected by default because it is the first in the alpha sort
of the table list for tempdb. Select the ContactsImport table and click the
Transformations tab. Notice the arrows correctly map the source and destination
columns.
- Click OK to close the dialog.
When you have multiple tasks within a package they will execute in parallel
unless you define the order of execution. In this case we want the File Transfer
Protocol Task to execute first, and if that is successful we want to execute the
Transform Data Task. We can ensure this happens by completing the following:
- Click on the File Transfer Protocol Task
- Press the Control key and select the Text File (Source) connection
- Click Workflow from the menu and select On Success
The last steps in this process are to archive the source file and scrub the
data. We do this by executing the pr_ContactsImport procedure with an Execute
SQL Task. Create this task by completing the following:
- Click Task on the menu and select Execute SQL Task
- Select Microsoft OLE DB Provider for SQL Server in the Existing Connection
drop-down
- In the SQL Statement input type: EXEC pr_ContactsImport
- Click OK to close the dialog
Make sure that it executes after the Transform Data Task by creating an On
Success Workflow like we have done twice before.
The package is complete and you can execute it by clicking Package from the menu
and selecting Execute. You should also be aware that you can schedule the
package to execute automatically. Simply right-click on the package in the Local
Packages view and select Schedule Package. For my project I knew that the source files would be uploaded no later than 10 AM, so I scheduled accordingly. After the data is loaded an email is automatically sent that informs the relevant personnel that the process completed successfully.
Final Thoughts
DTS is a great tool. If you have never worked with it I strongly encourage you
start experimenting with different methods of transferring/transforming data. I
am sure having an understanding of the functionality it supports will benefit
you in the long run.
Garth
www.SQLBook.com
|