Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 SSIS and Import/Export (2012)
 SSIS Data validation

Author  Topic 

GerenDFS
Starting Member

1 Post

Posted - 2013-12-17 : 16:22:47
Hello,

In my company we receive data files from customers. Most often some type of text based flat file. We import these via SSIS and then send the data along to where it is needed.

We've seen a growing need to have some type of validation on this data. Here is what we currently do (sorry for the length, I just want to clearly explain everything).


- Data is uploaded by customer to a folder.

- A job runs every X minutes that executes a SSIS package. The SSIS Package has a container loop that looks at this folder.

- SSIS package imports data to a temp table.
--- If data fails import complete (badly delimited data, etc) an error path kicks in and moves the file to an error folder and kicks off a generic error email

- Data that imports to the temp table has customer SQL run that looks at each mandatory column to see if data is present. Columns that "fail validation" have a log record written to a log table

- If a file has one or more log records for failed validation the data is removed from the system and an email is sent to the customer

- If there are no validation errors logged data is imported into production tables.


Here are the enhancements we want:

- Data is uploaded by customer to a folder.

- A job runs every X minutes that executes a SSIS package. The SSIS Package has a container loop that looks at this folder.

- Data is validated against some type of schema. That way if something is delimited badly, or a field is too long we can identify that and give the customer that detail in an error. Basically we want to say "Record number 1, field 2 is 50 characters, only 20 allowed)" instead of "Your file was unable to import"

- SSIS package imports data to a temp table.

- Data in temp table has some type of business rule logic run against it. (Field X must be >0 and numeric, field Y*fieldZ must = field A etc). Records failing the rules are recorded and sent to the customer

- If there are no validation errors logged data is imported into production tables.

Management believes there has to be tools or functionality that do this without always writing custom SQL. Biztalk has been suggested, but nobody has any experience with it and setting it up has been very difficult. Is this the right solution? What about SQL Data Quality Services?

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-18 : 07:03:43
We had the same requirement in one of our projects and what we implemented was this
1. First part is the same - SSIS package with ForEachLoop which picks up files, logs information etc
2. We store the validation patterns as regular expressions inside a rules table. We parse the files using these regex patterns inside a script task in SSIS and see if it has any violations. The regex patterns will be defined for each of fields separately and also there would be one defined for the whole row. The first step is to see if file row matches the whole row pattern. If yes, then its marked as success in log. In case of violation we parse the individual columns of row against corresponding column regex pattern and identify which row which column failed. we log these details also to log table.
Finally we check the count of rows in log with error. if its 0 we write to table. If > 0 we capture error details and send it in a mail to sender.
3. we've next step where we validate business rules ie Field x should be unique, Fieldy should be between 1000 and 10000 etc. this we do by means of execute sql task by calling a procedure and doing series of update statement to check and update log table with error if any. finally error rows anre sent back with corresponding error details and success rows transferred to final table. Then a mail is sent out to sender indicating success.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
   

- Advertisement -