Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Performance Problem

Author  Topic 

RaistlinM
Starting Member

4 Posts

Posted - 2002-03-14 : 11:25:00
I have written a couple of stored procedures to parse our a field into City , Province and Country but it takes about 30 minutes to perform the task.

Is there anything i can do to speed things up?

Here is the code i am using:


CREATE PROCEDURE sp_ParseCityProv2 AS

BEGIN
DECLARE @CityProv AS VARCHAR(100),
@temp AS Varchar(100),
@WordDiv as int,
@City AS VARCHAR(33),
@Prov AS VARCHAR(3),
@location as varchar(10),
@Country AS VARCHAR(34)
-- set a cursor to place the current value of the table into a variable for analysis
DECLARE AcctCursor SCROLL CURSOR FOR
SELECT [CityProv], [AcctNo]
FROM Name_0102test where CityProv <> ''
OPEN AcctCursor
FETCH FIRST FROM AcctCursor INTO
@CityProv, @location
While @@fetch_status = 0
Begin
EXEC sp_LeaveSpaces @CityProv , @CityProv OUTPUT
SET @temp = @cityprov
If @temp <> ''
Begin
-- grab the city
SET @WordDiv = Charindex(' ',@temp)
set @city = Ltrim(RTRIM(substring(@temp,1,@worddiv)))
Set @Prov = ''
set @country = ''
Set @temp = Substring(@temp,@worddiv, len(@temp))
set @temp = Ltrim(RTRIM(@temp))
--parse province / country
Exec sp_CityProvFix2 @temp , @prov OUT , @Country OUT
End
if ltrim(rtrim(@country))=ltrim(rtrim(@prov))
set @country = ''
UPDATE [Name_0102test]
SET [Name_0102test].[province]=@prov,[Name_0102test].[city]=@city,[Name_0102test].[country]=@country
WHERE [Name_0102test].[AcctNo]=@location;
set @city=''
set @prov=''
set @country=''
FETCH NEXT FROM AcctCursor INTO @CityProv, @location
End
CLOSE AcctCursor
DEALLOCATE AcctCursor
END


CREATE PROCEDURE sp_CityProvFix2
(
@Input varchar(30),
@Output1 varchar(30) OUTPUT,
@Output2 varchar(30) OUTPUT
)
AS

DECLARE @CityProv AS VARCHAR(100),
@temp AS Varchar(100),
@temp2 AS Varchar(100),
@count as int,
@position as int,
@position2 as int,
@WordDiv as int,
@City AS VARCHAR(33),
@Prov AS VARCHAR(33),
@Prov2 AS VARCHAR(33),
@check AS int,
@location as varchar(10),
@Country AS VARCHAR(34)
Begin

Set @prov=''
Set @temp2=''
Set @check = 0
Set @temp = @input
Set @prov2 = ''
--*************************** Prov / Country ************************
--Concatinate abbreviations such as p.e.i.
SET @WordDiv = Charindex(' ',@temp)
While @WordDiv = 2
Begin
SET @prov2 = @prov2+left(@temp,1)
if @worddiv = 0
begin
Set @prov = ltrim(rtrim(@prov)) + ltrim(rtrim(substring(@temp,1,1)))
Set @temp = substring(@temp,2,len(@temp))
Set @cityprov = @temp
ENd
if @worddiv <> 0
Set @temp = Substring(@temp,@WordDiv,Len(@temp))
Set @temp = rtrim(LTRIM(@temp))
SET @WordDiv = Charindex(' ',@temp)

set @check = 1
set @output1 = @prov2
if len(@temp) = 1
begin
set @output2 = ''
set @prov2=@prov2+@temp
set @output1 = @prov2
end
End
if @WordDiv > 2
Begin
set @prov = ltrim(rtrim(left(@temp,@WordDiv)))
set @output1 = @prov
Set @temp2 = ltrim(rtrim(Substring(@temp,@WordDiv,Len(@temp))))
set @output2 = @temp2
End
if @WordDiv = 0
set @output1 = ltrim(rtrim(@temp))
End

CREATE PROCEDURE sp_LeaveSpaces
(
--The string to have all commas and periods removed
@input varchar(8000),
@output varchar(8000) OUTPUT
)
AS
DECLARE @position as int,
@position2 as int
Begin
Set @position = 1
Set @position2 = 1
while (@position <> 0)
Begin
Set @position = charindex(',',@input)
if @position <> 0
set @input = stuff(@input,@position,1,' ')
While (@position2 <> 0)
Begin
Set @position2 = charindex('.',@input)
if @position2 <> 0
set @input = stuff(@input,@position2,1,' ')
End
End
Set @output = @input
End

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-03-14 : 11:38:13
1. DON'T USE CURSORS!!!
2. Take a look at these CSV articles:

http://www.sqlteam.com/SearchResults.asp?SearchTerms=csv

There are several ways you can parse out comma-separated values, which you can modify to split words. Also look at this article:

http://www.sqlteam.com/item.asp?ItemID=5857

It's a keyword search procedure that uses some pretty sweet word-parsing techniques.

3. Here's a much shorter version of sp_LeaveSpaces:

CREATE PROCEDURE sp_LeaveSpaces
(
--The string to have all commas and periods removed
@input varchar(8000), @output varchar(8000) OUTPUT )
AS
Set @output = Replace(Replace(@input, ',' , ' '), '.', ' ')


In fact, I would recommend that you look to incorporate this code into a single procedure, because you can use the REPLACE() function in a set-based operation. You won't have the same flexibility if you call this as a function from within another procedure (you'd still have to use cursors).

Go to Top of Page

RaistlinM
Starting Member

4 Posts

Posted - 2002-03-14 : 12:30:42
Ahh, but they are not always comma separated values and sometimes there are 3 elements or 2.

for example:
record 1 Ottawa, On, Canada
record 2 Toronto, On CA
record 3 Halifax, NS
Record 4 Charlotte, P.E.I. Canada
Record 5 Orlando Florida

The program has to be able to take into consideration all of these.

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-03-14 : 13:08:12
You could play with the string and use the PARSENAME function to pull out each piece:

1 Ottawa, On, Canada
2 Toronto, On CA
3 Halifax, NS
4 Charlotte, P.E.I. Canada
5 Orlando Florida


SELECT PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)
FROM Table


--returns

1 Canada
2 CA
3 NS
4 Canada
5 Florida


Now, you have some concerns about when they include a country code and when they don't. You can use PARSENAME to parse up to 4 sections and test each piece (if it's empty, parsename will return Null). As ugly as this code will look, it will at least let you operate on the entire set in one operation:

SELECT CityProv,
CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-
Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))
WHEN 1 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 2)
WHEN 2 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 3)
WHEN 3 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 4)
ELSE Null END AS City,
CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-
Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))
WHEN 1 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)
WHEN 2 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 2)
WHEN 3 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 3)
ELSE Null END AS StateProv,
CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-
Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))
WHEN 2 THEN
PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)
ELSE Null END AS Country
FROM Name_0102test


Now, before you throw up , see if that SELECT statement will work (the one in red, just copy it and paste it into query analyzer). You should get 4 columns: CityProv, City, State/Prov, and Country. I'm curious to see if this works because I can't really test it (I'll try it with some generated data, but it might take me some time).

The one problem I anticipate is if you have something like "St. George, NS, Canada" or another instance where the city is 2 or more words. I might have an idea on how to handle this, but I need to see if the above works first.

Let me know! HTH

This has been tested and the statement above was corrected!

I managed to get exactly the results I anticipated, so it does appear to work! Give it a try!

Edited by - robvolk on 03/14/2002 13:23:00
Go to Top of Page

RaistlinM
Starting Member

4 Posts

Posted - 2002-03-14 : 15:05:34
Soooo close, but it ran into problems with this type of info
Orig--------------------City-----prov--------country
NEPEAN,ON CA -->___NEPEANON___CA__________NULL
BARRIE ON CAN -->___BARRIE_____NULL________NULL
KANATA ON -->___KANATA_____NULL_______ON

As far as cities such as this
New York
I am curently using this as a revised sp_LeaveSpaces:

CREATE PROCEDURE sp_LeaveSpaces ( --The string to have all commas and periods removed @input varchar(8000), @output varchar(8000) OUTPUT ) AS DECLARE @position as int, @checker as int, @checked as int, @position2 as int Begin Set @position = 1 Set @checker = 999 Set @checked = 0 Set @position2 = 1 while (@position <> 0) Begin Set @position = charindex(',',@input) if @position = @checker+1 begin set @input = stuff(@input,@position,1,' zzzz ') set @checked =1 End

Set @checker = @position If @checked = 0 begin if @position <> 0 set @input = stuff(@input,@position,1,' ') end set @checked=0 While (@position2 <> 0) Begin Set @position2 = charindex('.',@input) if @position2 <> 0 set @input = stuff(@input,@position2,1,' ') End End Set @output = @input End

Thank you very much for all your help :)




Edited by - RaistlinM on 03/14/2002 15:13:19
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-03-14 : 15:17:56
Here's a question for you: are you cleaning up data in order to put it into a new system, or will there continue to be new data entered in this fashion? I'd suggest that if you need to separate city, state and country that you redesign your table(s) to have separate columns for each. It's a losing battle to continue to clean up data that will keep being entered in this fashion.

If you ARE changing the structure, if this gives you 90-95% correct results, can you move the data to the new structure and manually fix any that it misses? I'm concerned that you'll spend a lot of time and effort on this and get frustrated with the results.

Go to Top of Page

RaistlinM
Starting Member

4 Posts

Posted - 2002-03-15 : 09:19:18
The data comes this way from a file. :(
I will probably leave it with the slower version since it is more accurate with the results and have the procedures schedualed at late hours so process time does not matter. :)

Thank you for all your help.

I just created a new table with the same fields that i was working with and had the parsed information inserted into it. Once my while loop finished inserting all the data I run an update of the main table with an inner join with my temporary one and this cut my program's execution time from 24 minutes to 1 minute 10 seconds. Turns out all the updates i was doing in the loop was slowing it down. Since insert just sticks the information into a new record it sped things up drastically.

Thanks for all your help.

Edited by - RaistlinM on 03/15/2002 10:47:08
Go to Top of Page
   

- Advertisement -