Please start any new threads on our new site at http://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

Our new SQL Server Forums are live! Come on over! We've restricted the ability to create new threads on these forums.

SQL Server Forums
Profile | Active Topics | Members | Search | Forum FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Old Forums
 CLOSED - General SQL Server
 transferring 1 column into several
 Forum Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

s_a19
Starting Member

15 Posts

Posted - 08/01/2003 :  07:32:17  Show Profile
hy everyone, im having this problem

i am transferring data from one table to another table, but there is compatability issues, as in one table, this is contact which can have numerous amounts in, usually less than 3 people, for example

S Scott, Mr Scott Smith, Scott Smith

as you can see they are in different formats, but are always seperated by a comma, the above is only an example and can be different people

and i need to transfer it, into another table.
with title
inits
name
surname

as the requirements.

this one has me seriously stumped.

thanx

Page47
Flowing Fount of Yak Knowledge

USA
2878 Posts

Posted - 08/01/2003 :  07:41:43  Show Profile
You need to parse your CSV column into a table. Have a look at Parsing CSV Values Into Multiple Rows.

Jay White
{0}
Go to Top of Page

AndrewMurphy
Flowing Fount of Yak Knowledge

Ireland
2916 Posts

Posted - 08/01/2003 :  07:53:51  Show Profile
I'd tackle this in multiple passes....

1. sort out the titles (there's a very limited range of valid titles, which can be found on inspection or by a query - select distinct(left(name,patindex('% %',name)-1)) from customer
where patindex('% %',name)-1 > 0)
removing the titles from the input table after they have been 'moved' into the new table

2. sort out the surnames....slight reverse of the above..again remove
only issue you have here is that not all suffixes are surnames...ie company, ltd, jnr, etc....

3. anything left over could be split into 1st name, middle name (or dumped/concatanated onto 1st name)

4. Initials would be a formula on 1st name....ie left(1stname,1)


The source data having 'multiple customers' per-row should be sorted first....look for CSV on this site....that particular issue has come up often before....and then deal with getting the unstructured names into a structure....you won't be able to get 100% of the names right, but you could come very close to make an automated solution better than no solution.
Go to Top of Page

GreySky
Starting Member

USA
20 Posts

Posted - 08/01/2003 :  15:52:55  Show Profile
I have written a heuristic name parser that will parse names into:

-Prefix
-First
-Middle
-Last
-Suffix
-Title

components. It is industrial strength, can handle single names as well as name composites such as "David C. & Nancy Atkins" (splits into two distinct names: David-C.-Atkins and Nancy-Atkins), and generally is the best heuristic name parser I've ever seen (shucks I wrote it!).

It runs in VB (is an Access database), and I chiefly use it to aid data conversions from those icky "I've never seen a relational database before" formats to a relational design.

If want it, email me at atkinsdc@bellsouth.net or datkins@atl.invesco.com.

David Atkins, MCP

Edited by - GreySky on 08/01/2003 15:56:07
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 08/01/2003 :  15:59:01  Show Profile
How do you know to put what where?

You must accomodate some fallout, no?

What defines out of scope parsing?

quote:

Grey skies are gonna clear up, so put on a happy face




Brett

8-)

SELECT POST=NewId()

Edited by - X002548 on 08/01/2003 15:59:44
Go to Top of Page

s_a19
Starting Member

15 Posts

Posted - 08/03/2003 :  05:27:49  Show Profile
thanx for all the answers, i ideally need to do it, in mysql using oracle. but if i can do it another way. i will
Go to Top of Page

robvolk
Most Valuable Yak

USA
15732 Posts

Posted - 08/03/2003 :  08:07:34  Show Profile  Visit robvolk's Homepage
quote:
i ideally need to do it, in mysql using oracle. but if i can do it another way. i will
I don't know how you're going to parse MySQL data using Oracle, but posting the question on a SQL Server site probably won't help much.

Don't worry, the CSV parsing technique will work with any database product, and once you parse out the multiple comma-separated names, you can modify the technique to parse out the pieces of each name.
Go to Top of Page

s_a19
Starting Member

15 Posts

Posted - 08/03/2003 :  08:21:10  Show Profile
ohh right sorry,
but everything hopefully should still be ok.
what type of things would i need to modify?
Go to Top of Page

robvolk
Most Valuable Yak

USA
15732 Posts

Posted - 08/03/2003 :  08:26:17  Show Profile  Visit robvolk's Homepage
Instead of using a comma as a separator, you'd use a space. If you have data like David describes, then you might also need to look at parsing ampersands and periods too.

I'd recommend doing it in stages, try parsing simple names first, using Charindex and Substring:

SELECT SubString(FullName, 1, CharIndex(' ', FullName)) AS FirstName,
SubString(FullName, CharIndex(' ', FullName)+1, 100) AS LastName
FROM myTable


Once you get those parsed, then you can modify the expression to handle 3 names, Mr./Mrs./Dr. prefixes, and Jr./III/Esq. suffixes and such.
Go to Top of Page

s_a19
Starting Member

15 Posts

Posted - 08/03/2003 :  08:30:29  Show Profile
yeah i was thinking about just doing the ones, such as

A Smith, etc. which i feel should give me some feeling as to what is required for the other parts to complete.

Go to Top of Page

s_a19
Starting Member

15 Posts

Posted - 08/04/2003 :  05:56:31  Show Profile
i have just come to run one of the queries, like below

select substring(contact, 1, CharIndex('',Contact)) AS Name
substring(contact, CharIndex('',Contact)+1,100) AS Surname
from address;

and it comes up with from keyword not found were expected. neone know why?

thanx
Go to Top of Page

GreySky
Starting Member

USA
20 Posts

Posted - 08/04/2003 :  08:23:03  Show Profile
Dear X002548:

If you understand what a heuristic is, then you would realize that it makes assumptions. Of course, if the input is garbage, the output will be garbage; SISO rules still apply.

Quite frankly, I've had numerous replies very similar to yours regarding my name parser. I would only suggest you take a look at it to see what it can do. Those many who have requested it have returned very positive comments.

If you tell me how to parse "Grey skies are gonna clear up, so put on a happy face " into a name, then I'll tell my parser how to do it : ) That's what I mean by a heuristic, I tell the computer how to parse a name based upon how I would do it myself.

--Parser says:
First: Grey
Middle: skies are gonna clear
Last: up
Title: so put on a happy face

--Typical parser heuristics
-Prefix: Pre-defined (arrayed)
-First: First word after a prefix
-Middle: Between first and last, but especially if an initial
-Last: Last word before suffix/title (after last-name compositing)
-Suffix: Pre-defined (arrayed)
-Title: Post suffix or post comma
-Ampersand: Indicates new name for name compositing and binding
-Last name composites: Pre-defined (e.g., Van Gough, de la Hoya)
-Out-of-Scope: Instr() use scanning for illegals typically used for company names (e.g., numbers, phrases like 'institut')
-Name binding and compositing logic

It fills up 18 pages if you paste it into a Word document. So, yeah, until you've seen it ...

David Atkins, MCP



quote:
Originally posted by X002548

How do you know to put what where?

You must accomodate some fallout, no?

What defines out of scope parsing?

quote:

Grey skies are gonna clear up, so put on a happy face




Brett

8-)

SELECT POST=NewId()



Edited by - GreySky on 08/04/2003 08:53:41
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 08/04/2003 :  09:09:33  Show Profile
quote:

If you understand what a heuristic is, then you would realize that it makes assumptions. Of course, if the input is garbage, the output will be garbage; SISO rules still apply.



Pretty big assumption...I think I'll continue with absolutes...such that if something falls outside of the rules, someone will need to know about and correct it...

From Websters:

quote:


heuristic
Pronunciation: hyĆ»'ristik


Matching Terms: heuristic program, heuristic rule, heuristics testing


WordNet Dictionary

Definition: [n] a commonsense rule (or set of rules) intended to increase the probability of solving some problem
[adj] ofo or relating to a general formulation that serves to guide investigation
[adj] (computer science) relating to or using a heuristic rule


Synonyms: heuristic program, heuristic rule, trial-and-error

Antonyms: algorithmic

See Also: formula, lateral thinking, rule




Webster's 1913 Dictionary

Definition: \Heu*ris"tic\, a. [Gr. ? to discover.]
Serving to discover or find out.




Computing Dictionary

Definition: 1. A rule of thumb, simplification, or educated guess that reduces or limits the search for solutions in domains that are difficult and poorly understood. Unlike algorithms, heuristics do not guarantee optimal, or even feasible, solutions and are often used with no theoretical guarantee.

2. approximation algorithm.



See Also: algorithm, programming








Brett

8-)

SELECT POST=NewId()
Go to Top of Page

GreySky
Starting Member

USA
20 Posts

Posted - 08/04/2003 :  09:55:22  Show Profile
quote:
Pretty big assumption...I think I'll continue with absolutes...such that if something falls outside of the rules, someone will need to know about and correct it...


#1 - You assume the "rules" are known, rigid, and used without exception over all non heterogenous data sources you may have to work with.

#2 - How would you solve NP-hard problems, such as the 3d bin packing problem? Would you tell your customer, "Sorry, it's not absolute"? And, yes, I have one of these too, adapted from Vigo, Pisinger, Martello--except it's actually in production and rotates cases to aid in finding an "optimal solution."

Not everything fits so easily within a box, but still needs a solution. Perfection is not always a requirement.

David Atkins, MCP

Edited by - GreySky on 08/04/2003 10:13:29
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 08/04/2003 :  10:17:05  Show Profile
quote:
Originally posted by GreySky
Perfection is not always a requirement.



OK, what ever you say...

An exception handling process will always be part of any system I build...as long as I can make sure I "know" about it...that's why I try to positively identify things...

if it don't fit, you must acquit

"Sorry sir, that $1,000.00 deduction was just a mistake..."



Brett

8-)

SELECT POST=NewId()
Go to Top of Page

GreySky
Starting Member

USA
20 Posts

Posted - 08/04/2003 :  10:32:47  Show Profile
Exception handling has nothing to do with heuristics. Exception handling is for unexpected situations (typically an exception thrown by the processor itself or explicitly raised by an application)

A heuristic solution that is incorrect *doesn't know it's wrong.*

Our points are made.

David

quote:
Originally posted by X002548

quote:
Originally posted by GreySky
Perfection is not always a requirement.



OK, what ever you say...

An exception handling process will always be part of any system I build...as long as I can make sure I "know" about it...that's why I try to positively identify things...

if it don't fit, you must acquit

"Sorry sir, that $1,000.00 deduction was just a mistake..."



Brett

8-)

SELECT POST=NewId()


Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 08/04/2003 :  10:35:23  Show Profile
Thanks...I'll have to read up on the subject...



Brett

8-)

SELECT POST=NewId()
Go to Top of Page

Amethystium
Aged Yak Warrior

Garden of Sakuntala
701 Posts

Posted - 08/04/2003 :  12:37:35  Show Profile  Visit Amethystium's Homepage
You got your copy of Aphelion yet?

----------------
Shadow to Light
Go to Top of Page
  Previous Topic Topic Next Topic  
 Forum Locked
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.07 seconds. Powered By: Snitz Forums 2000