Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Simple way to replace any non numeric characters

Author  Topic 

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2004-04-07 : 16:30:45
I'm writing a process that receives data in several formats for a particular column. For example, we store data like phone number, zip codes, etc without formatting (only the numbers). The import process I'm writing can expect the data for phone number to have any number of formats such as:
1112222
555111222
111-2222
(555)111-2222
1-555-111-222
and the list goes on and on. I'm looking for a simple way to update the data before I begin importing it to remove any non numeric characters. Any quick and dirty tips? I could use the REPLACE function that will limit me to specific characters I need to look for. I want to remove any non numeric data (including spaces) and then validate the data based on the length. This way if they decide to start sending additional characters I'll be covered. I have to keep in mind that I'm dealing with rather large tables so performance is a definite consideration.

Thanks in advance!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-04-07 : 16:34:07
You will need to use REPLACE several times in order to get the desired format. You should not have any performance problems with using multiple REPLACEs in one statement.

Tara
Go to Top of Page

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2004-04-07 : 16:58:45
quote:
Originally posted by tduggan

You will need to use REPLACE several times in order to get the desired format. You should not have any performance problems with using multiple REPLACEs in one statement.

Tara



But how would one catch all non numeric characters other than coding for them all? It's really a shame REPLACE doesn't use wildcards :).
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-04-07 : 17:01:25
You can't catch them. You have to explicitly tell it to replace one value with another. Is this a one-time script? If not, then your application should only allow certain characters be entered into the field.

Tara
Go to Top of Page

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2004-04-07 : 17:08:37
quote:
Originally posted by tduggan

You can't catch them. You have to explicitly tell it to replace one value with another. Is this a one-time script? If not, then your application should only allow certain characters be entered into the field.

Tara



Actually, it's an application taking data into our SQL Server from an extremely outdated xbase system that is probably older than I am. It's a permenant application. Outside of creating some regex functions the best solution I can think of is to create a cursor of all records with non numeric values, loop through the cursor building another string by using PATINDEX for anything not 0-9 and then updating the column again. But, now I'm dealing with performance possibly. And I really can see this other system (which I have no control over) sending over other invalid data other than what I'm getting today.

For now, I wrote a simple update using nested REPLACEs and do an exists check for non numeric values. This way I can at least catch it again, stop the process and change the procedure to accomodate it. Not the most elegant (by far!) but it will work for now.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-04-07 : 18:22:14
you can write a simple, generic UDF for this.

here's an example:


create function Strip(@String varchar(8000), @PatternToKeep varchar(100))
returns varchar(8000)
as
begin
declare @i int;
declare @c varchar(1);
declare @Result varchar(8000);
set @Result = '';
set @i=1;
while (@i <= len(@String))
select @c=substring(@String,@i,1),
@Result=@Result + case when @c like @PatternToKeep then @c else '' end,
@i = @i +1
return (@Result)
end

go

select dbo.Strip('1-800-123-1234','[0-9]')


not too often you see a WHILE without a BEGIN/END block .... enjoy!
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-04-07 : 20:23:40
To get rid of non-numeric chars
http://www.nigelrivett.net/RemoveNonNumericCharacters.html
will be faster than testing every character in turn as above.


select @i = patindex('%[^0-9]%', @s)
while @i > 0
begin
select @s = replace(@s, substring(@s, @i, 1), '')
select @i = patindex('%[^0-9]%', @s)
end


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-04-07 : 20:35:07
Nigel,

You have broken the WHILE without the BEGIN/END rule.. :-)

In the interest of making it as hard to understand as possible for the next developer...

CREATE FUNCTION StripOut(@STRING VARCHAR(8000), @PATTERN VARCHAR(100))
RETURNS VARCHAR(8000)
AS
BEGIN
WHILE PATINDEX(@PATTERN, @STRING) != 0
SELECT @STRING = REPLACE(@STRING, SUBSTRING(@STRING,PATINDEX(@PATTERN, @STRING),1), '')
RETURN (@STRING)
END
go
select dbo.StripOut('1-800-123-1234','%-%')
select dbo.StripOut('1-800-123-1234','%[^0-9]%')





DavidM

"If you are not my family or friend, then I will discriminate against you"
Go to Top of Page

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2004-04-09 : 17:22:02
Thanks for the suggestions. These will definitely be used :). For example, I'm testing a full sampling of data for phone numbers and I'm getting garbage like this:

3018697510`
313=3103040
3608345044'
617.787.01
617.438.90
NP
UNLISTED
NP
617.623.03
617.332.68
617.244.97
NP
GETPATIENT
NP
617.242.15
617.332.68
603.682.00
781.365.14
NA
Go to Top of Page
   

- Advertisement -