| Author |
Topic |
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2004-04-07 : 16:30:45
|
| I'm writing a process that receives data in several formats for a particular column. For example, we store data like phone number, zip codes, etc without formatting (only the numbers). The import process I'm writing can expect the data for phone number to have any number of formats such as:1112222555111222111-2222(555)111-22221-555-111-222and the list goes on and on. I'm looking for a simple way to update the data before I begin importing it to remove any non numeric characters. Any quick and dirty tips? I could use the REPLACE function that will limit me to specific characters I need to look for. I want to remove any non numeric data (including spaces) and then validate the data based on the length. This way if they decide to start sending additional characters I'll be covered. I have to keep in mind that I'm dealing with rather large tables so performance is a definite consideration.Thanks in advance! |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-04-07 : 16:34:07
|
| You will need to use REPLACE several times in order to get the desired format. You should not have any performance problems with using multiple REPLACEs in one statement.Tara |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2004-04-07 : 16:58:45
|
quote: Originally posted by tduggan You will need to use REPLACE several times in order to get the desired format. You should not have any performance problems with using multiple REPLACEs in one statement.Tara
But how would one catch all non numeric characters other than coding for them all? It's really a shame REPLACE doesn't use wildcards :). |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-04-07 : 17:01:25
|
| You can't catch them. You have to explicitly tell it to replace one value with another. Is this a one-time script? If not, then your application should only allow certain characters be entered into the field.Tara |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2004-04-07 : 17:08:37
|
quote: Originally posted by tduggan You can't catch them. You have to explicitly tell it to replace one value with another. Is this a one-time script? If not, then your application should only allow certain characters be entered into the field.Tara
Actually, it's an application taking data into our SQL Server from an extremely outdated xbase system that is probably older than I am. It's a permenant application. Outside of creating some regex functions the best solution I can think of is to create a cursor of all records with non numeric values, loop through the cursor building another string by using PATINDEX for anything not 0-9 and then updating the column again. But, now I'm dealing with performance possibly. And I really can see this other system (which I have no control over) sending over other invalid data other than what I'm getting today.For now, I wrote a simple update using nested REPLACEs and do an exists check for non numeric values. This way I can at least catch it again, stop the process and change the procedure to accomodate it. Not the most elegant (by far!) but it will work for now. |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-04-07 : 18:22:14
|
you can write a simple, generic UDF for this. here's an example:create function Strip(@String varchar(8000), @PatternToKeep varchar(100))returns varchar(8000)asbegin declare @i int; declare @c varchar(1); declare @Result varchar(8000); set @Result = ''; set @i=1; while (@i <= len(@String)) select @c=substring(@String,@i,1), @Result=@Result + case when @c like @PatternToKeep then @c else '' end, @i = @i +1 return (@Result)endgoselect dbo.Strip('1-800-123-1234','[0-9]')not too often you see a WHILE without a BEGIN/END block .... enjoy! |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-04-07 : 20:23:40
|
| To get rid of non-numeric charshttp://www.nigelrivett.net/RemoveNonNumericCharacters.htmlwill be faster than testing every character in turn as above. select @i = patindex('%[^0-9]%', @s) while @i > 0 begin select @s = replace(@s, substring(@s, @i, 1), '') select @i = patindex('%[^0-9]%', @s) end==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2004-04-07 : 20:35:07
|
Nigel,You have broken the WHILE without the BEGIN/END rule.. :-)In the interest of making it as hard to understand as possible for the next developer...CREATE FUNCTION StripOut(@STRING VARCHAR(8000), @PATTERN VARCHAR(100))RETURNS VARCHAR(8000)ASBEGINWHILE PATINDEX(@PATTERN, @STRING) != 0 SELECT @STRING = REPLACE(@STRING, SUBSTRING(@STRING,PATINDEX(@PATTERN, @STRING),1), '')RETURN (@STRING)ENDgoselect dbo.StripOut('1-800-123-1234','%-%')select dbo.StripOut('1-800-123-1234','%[^0-9]%')DavidM"If you are not my family or friend, then I will discriminate against you" |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2004-04-09 : 17:22:02
|
| Thanks for the suggestions. These will definitely be used :). For example, I'm testing a full sampling of data for phone numbers and I'm getting garbage like this:3018697510`313=31030403608345044'617.787.01617.438.90NPUNLISTEDNP617.623.03617.332.68617.244.97NPGETPATIENTNP617.242.15617.332.68603.682.00781.365.14NA |
 |
|
|
|