Using DTS to Automate a Data Import Process

By Garth Wells on 11 August 2002 | Tags: DTS


In this article I show you how to create a DTS Package that can be used to automate a file download/import process. It's a little longer than the other articles I have written for SQLTeam and will require some work on your part if you want to implement it, but I think you will find it interesting.

I recently re-wrote a database application that gets daily data feeds via text files that are uploaded to an ftp site. My client's customers generate the text files from their mainframe systems and upload them to the ftp site for access. The files are:
  1. Downloaded from the ftp site
  2. Loaded into the database
  3. Renamed, and
  4. Copied to a new location so they can be archived

Automating the Process with DTS

I do not like data loading processes that are dependent on human intervention, so during the re-write I wanted to replace this process with one that could be scheduled and run automatically. I remembered that DTS (Data Transformation Services) has a task that supports ftp, so I decided to investigate this approach. It could not have been easier to implement.

DTS: A Brief Introduction

If you have never worked with DTS, you have missed using one of the most useful tools that come with SQL Server. DTS makes it so easy to get data in/out of SQL Server, that it almost makes up for all the years developers had to put up with the limited functionality of bcp (bulk copy program). If you have never used bcp, consider yourself lucky.

The main purpose of DTS is to allow you to define connections to various data sources and destinations and pump the data from one to the other. For example, you can define a text file as a data source, an instance of SQL Server as the destination, and then pump the data from one to the other. DTS supports numerous data sources, all of which can be seen by opening a new package and clicking the Connection option. To open a new package, expand the Server tree of the target SQL Server and click the Data Transformation Services folder. You can now right-click on Local Packages and select New Package. At this point you define the connections and add any other tasks that need to be completed during the transfer process.

DTS allows you to include several other tasks that can be executed before and after the data transfer process. For example, the Process task will execute an .exe or .bat file, and the ActiveX Script Task allows you to execute either JScript or VBScript. And of course there is the File Transfer Protocol Task, which is the main focus of this article. The other tasks supported can be seen by clicking Task on the menu.

Creating the Pieces

I needed a DTS package that would do the following:

  1. Connect to an ftp site and download a specific file
  2. Transfer the data from the file into a SQL Server table
  3. Rename and move the file to an archive directory
  4. Scrub/load the data into the destination table

Before we can create a package that does this, we need some source data, two destination tables, and a stored procedure. We will use a fixed-width text file that conforms to the following layout for the data source. You can use your favorite text editor to create the file.

Column  	Width
Name            40
Phone           10

My sample data looks like this:

Kelly Smith                             8885551212
Paul Jones                              8885551213
Mike Andrews                            8885551214

Creating the tables is the next step. You may be wondering why I want to create two destination tables. The first one is a temporary destination that holds the data in the exact same format as the text file, and the second one is the data's final destination--how it appears after being scrubbed. The tables can be created with the following DDL.

CREATE TABLE ContactsImport
(
 CI_Name char(40) NOT NULL,
 CI_Phone char(10) NOT NULL,
 CI_Processed char(1) DEFAULT 'N'
)
go
CREATE TABLE Contacts
(
 CN_ID int IDENTITY NOT NULL PRIMARY KEY,
 CN_FName varchar(20) NOT NULL,
 CN_LName varchar(20) NOT NULL,
 CN_Phone varchar(10) NOT NULL
)

I learned to use the two table approach during my first experience with DTS in SQL Server 7. You can actually scrub the data while it is in transit using VBScript or JScript, but that slowed down the import significantly when large files were processed. The faster approach (in my experience) is to import the data as-is, and process it with a T-SQL inside the database. Of course, when large files are processed you do need to make sure that you have the hard drive space to accommodate the extra copy of the data.

The stored procedure used to rename/move the source file and scrub the data is shown here.

CREATE PROCEDURE pr_ContactsImport
AS

-- Part I: Move/rename source file
DECLARE @fn char(6),
        @cmd varchar(100)

SET @fn=REPLACE(convert(char(8),getdate(),1),'/','')
SET @cmd ='Move c:\articles\contacts.txt c:\articles\archive\'+@fn+'.txt'
EXEC master..xp_cmdshell @cmd
-- End of Part I: Move/rename source file


-- Part II: Scrub the data
BEGIN TRANSACTION

INSERT Contacts (CN_FName,CN_LName,CN_Phone)
SELECT SUBSTRING(CI_Name,1,CHARINDEX(CHAR(32),CI_Name)-1),
       SUBSTRING(CI_Name,CHARINDEX(CHAR(32),CI_Name)+1,DATALENGTH(CI_Name)),
       CI_Phone
FROM ContactsImport
WHERE CI_Processed='N'

IF @@Error <> 0
 GOTO ErrorHandler

UPDATE ContactsImport SET CI_Processed='Y'

IF @@Error <> 0
 GOTO ErrorHandler

COMMIT TRANSACTION

ErrorHandler:
IF @@TRANCOUNT > 0
 ROLLBACK TRANSACTION
RETURN
-- End of Part II: Scrub the data
go

To facilitate the explanation I have separated the code into two Parts. The code in Part I moves/renames the file using xp_cmdshell. xp_cmdshell is an extended stored procedure that is used to execute an O/S command. I use the CONVERT and REPLACE functions to create the new file name that conforms to a mmddyy.txt convention and then execute the Move command. The source directory is C:\Articles and the archiving directory is: C:\Articles\Archive.

NOTE: A user must be granted Execute permission on xp_cmdshell unless they are a member of the Sysadmin fixed server role.

In Part II I scrub the data. The contact name is stored in one column in the data source (text file), but I want to store it in two in the Contacts table. The SUBSTRING and CHARINDEX functions allow me to find the space between the first and last name and parse accordingly.

For my testing I executed the above DDL in tempdb.

Creating the Package

Now that we have all the pieces in place we can create the package. Start a new package using the instructions in the DTS: A Brief Introduction section, and complete the following:

  1. Click Task on the menu and select File Transfer Protocol Task
  2. Make sure Internet Site is selected as the source and then specify the connection information for the ftp site
  3. Input the Destination directory
  4. Click the Files tab and a list of files on the site will be displayed
  5. Double-click the source file and click OK

At this point you can test the task by right-clicking on it and selecting Execute Step. The file will be moved to the destination directory.

The next step is to create the Transform Data Task that transfers the data from the text file into the ContactsImport table. Before you can create this task, you must define the source/destination connections. Create the source connection by completing the following:

  1. Click Connection on the menu and select Text File (Source).
  2. Specify the File name (and location) and click OK.
  3. In the Text File Properties dialog ensure that the Fixed Field is selected and click Next.
  4. If the data is aligned per the file format specified above, the column marker (black line with arrow) will automatically be placed at the 40 position and the end-of-file marker (red line) will be at 50. Click Finish and OK to complete the connection definition.

Now we need to create the connection for the destination. Complete the following to do this.

  1. Click Connection on the menu and select Microsoft OLE DB Provider for SQL Server
  2. Specify the server name and connection info and ensure tempdb (or what ever database you are using) is selected as the target database

The source/destination connections are defined, so let's create the Transform Data Task with the following:

  1. Click Tasks on the menu and select Transform Data Task.
  2. Click the source connection and then the destination connection. An arrow is created that links the two connections.
  3. Double-click the arrow and select the Destination tab. On my server, the Contacts table is selected by default because it is the first in the alpha sort of the table list for tempdb. Select the ContactsImport table and click the Transformations tab. Notice the arrows correctly map the source and destination columns.
  4. Click OK to close the dialog.

When you have multiple tasks within a package they will execute in parallel unless you define the order of execution. In this case we want the File Transfer Protocol Task to execute first, and if that is successful we want to execute the Transform Data Task. We can ensure this happens by completing the following:

  1. Click on the File Transfer Protocol Task
  2. Press the Control key and select the Text File (Source) connection
  3. Click Workflow from the menu and select On Success

The last steps in this process are to archive the source file and scrub the data. We do this by executing the pr_ContactsImport procedure with an Execute SQL Task. Create this task by completing the following:

  1. Click Task on the menu and select Execute SQL Task
  2. Select Microsoft OLE DB Provider for SQL Server in the Existing Connection drop-down
  3. In the SQL Statement input type: EXEC pr_ContactsImport
  4. Click OK to close the dialog

Make sure that it executes after the Transform Data Task by creating an On Success Workflow like we have done twice before.

The package is complete and you can execute it by clicking Package from the menu and selecting Execute. You should also be aware that you can schedule the package to execute automatically. Simply right-click on the package in the Local Packages view and select Schedule Package. For my project I knew that the source files would be uploaded no later than 10 AM, so I scheduled accordingly. After the data is loaded an email is automatically sent that informs the relevant personnel that the process completed successfully.

Final Thoughts

DTS is a great tool. If you have never worked with it I strongly encourage you start experimenting with different methods of transferring/transforming data. I am sure having an understanding of the functionality it supports will benefit you in the long run.

Garth
www.SQLBook.com


Related Articles

How to Asynchronously Execute a DTS package from ASP or ASP.NET (27 March 2005)

Programmatically Save DTS Packages to Files (24 May 2004)

A Practical Guide to SQL Server Yukon Beta 1 DTS (4 February 2004)

DTS and C# (12 September 2003)

SQL Server DTS Best Practices (27 May 2003)

Using DTS and FTP to Push Files (13 January 2003)

How to overwrite DTS package logs everytime the package executes? (17 July 2002)

SQLDTS.com (3 June 2001)

Other Recent Forum Posts

I have a error called "Could not find stored procedure 'usp_binarytreecustomerlist'." (32m)

Cannot Show requested dialogue - Timeout Expired. The Timeout priod elapsed prior to completion of the operation or the server is not responding (6h)

Basic SQL query? (18h)

T-sql - we created Message from app1 and trying to disable from app2 (1d)

SQL select Top 10 records for unique combination of two columns (1d)

SSRS Report Sorting with Grouping Issue (2d)

ORA-01476: divisor is equal to zero (2d)

Create new columns based on min and max values of a record with multiple rows (3d)

- Advertisement -