Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Splitting a name column

Author  Topic 

nlocklin
Yak Posting Veteran

69 Posts

Posted - 2002-07-01 : 10:04:36
I'm working with some legacy data, and unfortunately I have a table that came from the mainframe with people's names in just one column (as opposed to a column for first name, a column for last name, etc).

Unfortunately this column stores just about everything you could imagine putting in a name - any kind of prefix that you can think of (Dr., Mr., Ms., Hon., His Excellency, etc.), odd names such as hyphenated last names (as well as last names that should have been hyphenated but weren't for whatever reason), and all kinds of suffixes including types of degrees (so many in fact, that creating a list of them would be difficult). Some names have middle initials, some don't. Some names have full middle names, some don't.

Sadly, this is the most challenging aspect of the whole migration. Has anyone worked with anything like this before? I'm writing some VBScript in a DTS package with an enormous set of rules to try and identify the different components of each name field. So far I think I have something that would be successful against about 80% or 90% of the names in the table. Just wondering if anyone here has done something like this, and what your solution was.


--
"I'm always doing that. I'm always messing up some mundane detail."

Nazim
A custom title

1408 Posts

Posted - 2002-07-01 : 10:11:20
if therez consitency in prefixing it would be easy . something like a fullstop after each prefix. if this is true then you can play with charindex to split the column.



-------------------------
What lies behind you and what lies ahead of you are small matters compared to what lies within you.-Ralph Waldo Emerson


Edited by - Nazim on 07/01/2002 10:12:00
Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-07-01 : 10:12:29
There are 3rd party name parsers available; I've used several in the past for the same business reason, mainframe conversions, and they do work fairly well.

Jonathan Boott, MCDBA
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2002-07-01 : 11:38:07
it'll probably take a couple of passes to sort it out...and in the end it'll probaly come down to a visual inspection/QA + repair by your userbase....


but tackling the obvious exceptions first and working forwards to the clean(ish) data might work out simplest.


1. parse (and remove) from the input data the predictable prefixes..the dr. mr, mrs, etc already identified
2. handle the same as suffixes...!!!
3. parse (and remove) 1st names longer than 1/2 chars?
4. parse (and remove) last names longer than 1/2 chars?
5. mark any remaining unparsed data as 'for user fixing'!!

6. then running something like the following will also help with the quality control...

Select firstname, count(*) from cleantable1
group by firstname
order by count(*) desc

should show up the distinct first (second, title) names produced by the conversion process....oddities should be able to be identified way faster that way, than visually scanning all the data...typically they'll have a count(*) of 1...


passing the resultant individual names through a spell-checker might give a bigger confidence that the process is viable....

Go to Top of Page
   

- Advertisement -