Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Fast Cleaning process

Author  Topic 

cipriani1984
Constraint Violating Yak Guru

304 Posts

Posted - 2011-09-07 : 09:42:45
Hi,

Im trying to write a process that would clean up phone numbers in order to match them up with another table to gain more data.

CREATE FUNCTION dbo.get_cleanPhoneNumber_fn (@Phone VARCHAR(20))
RETURNS VARCHAR(10)
BEGIN
DECLARE @retval VARCHAR(10)
DECLARE @i TINYINT
DECLARE @j TINYINT
SET @j = LEN(@Phone)
SET @i = 1
SET @retval = ''
WHILE @i <= @j AND LEN(@retval) < 10
BEGIN
IF LEFT(RIGHT(@Phone,@i),1) IN ('0','1','2','3','4','5','6','7','8','9')
SET @retval = LEFT(RIGHT(@Phone,@i),1) + @retval
SET @i = @i + 1
END
RETURN @retval
END


Then use the following query:

SELECT CustomerNumber,
dbo.get_cleanPhoneNumber_fn(mobilePhoneIndicator) AS cleanPhoneNumber INTO #customers
FROM CurrentNameAddress;

The currentNameAddress table is quite large and the whole process takes 5 minutes to complete.

Any ideas on how to improve performance through code?

Thanks

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-07 : 09:53:04
[code]ALTER FUNCTION dbo.fnGetCleanPhoneNumber
(
@Phone VARCHAR(20)
)
RETURNS VARCHAR(20)
AS
BEGIN
RETURN (
CASE WHEN SUBSTRING(@Phone, 1, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 1, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 2, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 2, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 3, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 3, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 4, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 4, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 5, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 5, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 6, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 6, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 7, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 7, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 8, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 8, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 9, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 9, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 10, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 10, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 11, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 11, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 12, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 12, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 13, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 13, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 14, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 14, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 15, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 15, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 16, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 16, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 17, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 17, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 18, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 18, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 19, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 19, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 20, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 20, 1) ELSE '' END
)
END[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

CoderMan1
Starting Member

6 Posts

Posted - 2011-09-07 : 10:01:07
what is your typical input and your desired output? seems like you could just use REPLACE to get rid of characters that would throw off your matching, like (, ), -, and spaces
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-07 : 10:17:11
I would start with

RETURN
CASE WHEN @Phone NOT LIKE '%[^0-9]%' THEN @Phone
ELSE ... longer, slower code ...

so that numbers that are already just-digits will be returned

Better still would be to use that test externally to avoid calling the function if the number is already clean.
Go to Top of Page

cipriani1984
Constraint Violating Yak Guru

304 Posts

Posted - 2011-09-07 : 10:31:02
The problem is the field contains alot of non numeric characters aswell as rogue characters, which causes a problem

quote:
Originally posted by CoderMan1

what is your typical input and your desired output? seems like you could just use REPLACE to get rid of characters that would throw off your matching, like (, ), -, and spaces

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-07 : 10:36:01
So is Peso's code faster?

If there are NO rows that are already "clean", or you've already filtered them out, then ignore my suggestion.
Go to Top of Page

cipriani1984
Constraint Violating Yak Guru

304 Posts

Posted - 2011-09-07 : 10:37:35
No his code returns less results than mine :s
but yes it is faster.

quote:
Originally posted by Kristen

So is Peso's code faster?

If there are NO rows that are already "clean", or you've already filtered them out, then ignore my suggestion.

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-07 : 11:07:03
"his code returns less results than mine"

Fewer phone digits? or fewer rows?

I can't see how Peso's code would change the number rows displayed (unless you have something in your WHERE clause that is excluding some of the cleaned-up records for some reason? Perhaps rows where there were no numeric digits at all?)
Go to Top of Page

cipriani1984
Constraint Violating Yak Guru

304 Posts

Posted - 2011-09-07 : 11:19:01
it returns less results

quote:
Originally posted by Kristen

"his code returns less results than mine"

Fewer phone digits? or fewer rows?

I can't see how Peso's code would change the number rows displayed (unless you have something in your WHERE clause that is excluding some of the cleaned-up records for some reason? Perhaps rows where there were no numeric digits at all?)

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-07 : 11:57:20
Yeah, you said "less results" before, I don't know what you mean though.

Pesos code just takes a string parameter and returns a, modified, string value. It doesn't prevent rows from being included. So what do you mean by "less results"?
Go to Top of Page

cipriani1984
Constraint Violating Yak Guru

304 Posts

Posted - 2011-09-07 : 12:10:01
My code ran for 5 minutes and produced 4k results

peso's code ran for 2 minutes and produced 1.7k results.

found less matches than my code
thats what i mean

quote:
Originally posted by Kristen

Yeah, you said "less results" before, I don't know what you mean though.

Pesos code just takes a string parameter and returns a, modified, string value. It doesn't prevent rows from being included. So what do you mean by "less results"?

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-07 : 14:07:55
Post full query, please!



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-07 : 14:15:36
You can use this faster approach too, by changing the scalar function to a table function.
DROP FUNCTION dbo.fnGetCleanPhoneNumber
GO
CREATE FUNCTION dbo.fnGetCleanPhoneNumber
(
@Phone VARCHAR(20)
)
RETURNS TABLE
AS
RETURN (
SELECT CASE WHEN SUBSTRING(@Phone, 1, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 1, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 2, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 2, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 3, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 3, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 4, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 4, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 5, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 5, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 6, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 6, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 7, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 7, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 8, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 8, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 9, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 9, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 10, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 10, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 11, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 11, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 12, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 12, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 13, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 13, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 14, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 14, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 15, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 15, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 16, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 16, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 17, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 17, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 18, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 18, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 19, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 19, 1) ELSE '' END +
CASE WHEN SUBSTRING(@Phone, 20, 1) LIKE '[0-9]' THEN SUBSTRING(@Phone, 20, 1) ELSE '' END AS PhoneNumber
)
Then your code look like this instead
SELECT		cna.CustomerNumber,
f.PhoneNumber AS CleanPhoneNumber
INTO #Customers
FROM CurrentNameAddress AS cna
OUTER APPLY dbo.fnGetCleanPhoneNumber(cna.MobilePhoneIndicator) AS f
How long time does it take now?


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-08 : 01:24:18
quote:
Originally posted by cipriani1984

My code ran for 5 minutes and produced 4k results

peso's code ran for 2 minutes and produced 1.7k results.

found less matches than my code
thats what i mean


OK, I understand now.

Can you post a value where there was a difference between your code and Peso's please?

Original value, your value, Peso's value are what we need to see.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-08 : 01:27:56
Note that Peso's function returns 20 characters, yours only returns 10 - so if the number is more than 10 digits yours will be truncating the result. Maybe there are some entries with >10 digits? - not real phone numbers, obviously ...
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-08 : 02:13:42
If you want the first 10 digits to be returned, change my function from "RETURNS VARCHAR(20)" to "RETURNS VARCHAR(10)".



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -