Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 SSIS and Import/Export (2012)
 Bad Address Data

Author  Topic 

brubakerbr
Yak Posting Veteran

53 Posts

Posted - 2013-10-30 : 17:22:59

I am working on a project where I get information from an outside vender in an Excel report and I need to process it through a series of transformations but the below data needs to be split out so that I can have the City and State and dump the rest of the data.

Here is an example of the bad data
[PlantName]

|ATLANTA,GA AIRPORT ONSITE|
|BETHLEHEM, PA|
|ANCHORAGE, AK No Ship HazMat|
|FOREST PARK, GA|
|HUNTINGTON,WV No Ship|

Here is an example of the clean data I need

[PlantName], [City], [State]

|ATLANTA|ATLANTA|GA|
|BETHLEHEM|BETHLEHEM|PA|
|ANCHORAGE|ANCHORAGE|AK|
|FOREST PARK|FOREST PARK|GA|
|HUNTINGTON|HUNTINGTON|WV|

I will need to have the City Name in the Plant Name so that I can use it in a naming convention, a concatenation with another field to create an actual plant name.

I want to use an SSIS expression but I am open to T-SQL if it is quicker and more reliable.

Brian D. Brubaker
Business Intelligence Analyst
Viega LLC

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2013-10-30 : 17:48:14
well, here's one way - but just for the data you posted. The problem with transforming vendor data is who knows what will show up in these fields. And this code below will break if the value doesn't contain a comma so I'm sure there are a lot of tweaks that will need to be made with the real data.

Assumes you will have a [States] table with your state abbreviations. And that you will load the raw data into a staging table.


;with staging (cityState)
as
(
select 'ATLANTA,GA AIRPORT ONSITE' union all
select 'BETHLEHEM, PA' union all
select 'ANCHORAGE, AK No Ship HazMat' union all
select 'FOREST PARK, GA' union all
select 'HUNTINGTON,WV No Ship'
)
, states (mnemonic)
as
(
select 'GA' union all
select 'PA' union all
select 'WV' union all
select 'AK' union all
select 'RI'
)

select left(cityState, charindex(',', cityState)-1) as PlantName
,left(cityState, charindex(',', cityState)-1) as City
,st.mnemonic
from staging sg
left outer join states st
on st.mnemonic = substring(replace(sg.cityState,', ',','), charindex(',', sg.cityState)+1, 2)


OUTPUT:
PlantName City mnemonic
---------------------------- ---------------------------- --------
ATLANTA ATLANTA GA
BETHLEHEM BETHLEHEM PA
ANCHORAGE ANCHORAGE AK
FOREST PARK FOREST PARK GA
HUNTINGTON HUNTINGTON WV


Be One with the Optimizer
TG
Go to Top of Page

brubakerbr
Yak Posting Veteran

53 Posts

Posted - 2013-11-08 : 11:28:50
I ended up using an Expression that worked out pretty well.

FINDSTRING(PlantName,",",1) != 0 ? (SUBSTRING(PlantName,1,FINDSTRING(PlantName,",",1) - 1)) : PlantName

This pulled the City Name out of the bad data and allowed me to use it in two different locations, 1. to build my Clean City Column and 2. in a concat for the Branch name.

I also used the Token Expression to pull the State out for the State column.

I did copy your script though. I may need it on another project.

Thank you for all your help.

Brian D. Brubaker
Business Intelligence Analyst
Viega LLC
Go to Top of Page
   

- Advertisement -