| Author |
Topic |
|
RaistlinM
Starting Member
4 Posts |
Posted - 2002-03-14 : 11:25:00
|
| I have written a couple of stored procedures to parse our a field into City , Province and Country but it takes about 30 minutes to perform the task.Is there anything i can do to speed things up?Here is the code i am using:CREATE PROCEDURE sp_ParseCityProv2 ASBEGINDECLARE @CityProv AS VARCHAR(100),@temp AS Varchar(100),@WordDiv as int,@City AS VARCHAR(33),@Prov AS VARCHAR(3),@location as varchar(10),@Country AS VARCHAR(34)-- set a cursor to place the current value of the table into a variable for analysis DECLARE AcctCursor SCROLL CURSOR FORSELECT [CityProv], [AcctNo]FROM Name_0102test where CityProv <> ''OPEN AcctCursorFETCH FIRST FROM AcctCursor INTO@CityProv, @locationWhile @@fetch_status = 0BeginEXEC sp_LeaveSpaces @CityProv , @CityProv OUTPUTSET @temp = @cityprovIf @temp <> ''Begin-- grab the citySET @WordDiv = Charindex(' ',@temp)set @city = Ltrim(RTRIM(substring(@temp,1,@worddiv)))Set @Prov = ''set @country = ''Set @temp = Substring(@temp,@worddiv, len(@temp))set @temp = Ltrim(RTRIM(@temp))--parse province / countryExec sp_CityProvFix2 @temp , @prov OUT , @Country OUTEndif ltrim(rtrim(@country))=ltrim(rtrim(@prov))set @country = ''UPDATE [Name_0102test] SET [Name_0102test].[province]=@prov,[Name_0102test].[city]=@city,[Name_0102test].[country]=@countryWHERE [Name_0102test].[AcctNo]=@location;set @city=''set @prov=''set @country=''FETCH NEXT FROM AcctCursor INTO @CityProv, @location EndCLOSE AcctCursorDEALLOCATE AcctCursorENDCREATE PROCEDURE sp_CityProvFix2 (@Input varchar(30),@Output1 varchar(30) OUTPUT,@Output2 varchar(30) OUTPUT)ASDECLARE @CityProv AS VARCHAR(100),@temp AS Varchar(100),@temp2 AS Varchar(100),@count as int, @position as int, @position2 as int,@WordDiv as int,@City AS VARCHAR(33),@Prov AS VARCHAR(33),@Prov2 AS VARCHAR(33),@check AS int,@location as varchar(10),@Country AS VARCHAR(34)BeginSet @prov=''Set @temp2=''Set @check = 0Set @temp = @inputSet @prov2 = ''--*************************** Prov / Country ************************--Concatinate abbreviations such as p.e.i.SET @WordDiv = Charindex(' ',@temp)While @WordDiv = 2BeginSET @prov2 = @prov2+left(@temp,1)if @worddiv = 0 beginSet @prov = ltrim(rtrim(@prov)) + ltrim(rtrim(substring(@temp,1,1)))Set @temp = substring(@temp,2,len(@temp))Set @cityprov = @tempENdif @worddiv <> 0Set @temp = Substring(@temp,@WordDiv,Len(@temp))Set @temp = rtrim(LTRIM(@temp))SET @WordDiv = Charindex(' ',@temp)set @check = 1set @output1 = @prov2if len(@temp) = 1beginset @output2 = ''set @prov2=@prov2+@tempset @output1 = @prov2endEndif @WordDiv > 2Beginset @prov = ltrim(rtrim(left(@temp,@WordDiv)))set @output1 = @provSet @temp2 = ltrim(rtrim(Substring(@temp,@WordDiv,Len(@temp))))set @output2 = @temp2Endif @WordDiv = 0set @output1 = ltrim(rtrim(@temp))EndCREATE PROCEDURE sp_LeaveSpaces (--The string to have all commas and periods removed@input varchar(8000),@output varchar(8000) OUTPUT )ASDECLARE @position as int, @position2 as intBeginSet @position = 1Set @position2 = 1while (@position <> 0)BeginSet @position = charindex(',',@input)if @position <> 0set @input = stuff(@input,@position,1,' ')While (@position2 <> 0)BeginSet @position2 = charindex('.',@input)if @position2 <> 0set @input = stuff(@input,@position2,1,' ')EndEndSet @output = @inputEnd |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-03-14 : 11:38:13
|
| 1. DON'T USE CURSORS!!!2. Take a look at these CSV articles:http://www.sqlteam.com/SearchResults.asp?SearchTerms=csvThere are several ways you can parse out comma-separated values, which you can modify to split words. Also look at this article:http://www.sqlteam.com/item.asp?ItemID=5857It's a keyword search procedure that uses some pretty sweet word-parsing techniques.3. Here's a much shorter version of sp_LeaveSpaces:CREATE PROCEDURE sp_LeaveSpaces (--The string to have all commas and periods removed@input varchar(8000), @output varchar(8000) OUTPUT )ASSet @output = Replace(Replace(@input, ',' , ' '), '.', ' ')In fact, I would recommend that you look to incorporate this code into a single procedure, because you can use the REPLACE() function in a set-based operation. You won't have the same flexibility if you call this as a function from within another procedure (you'd still have to use cursors). |
 |
|
|
RaistlinM
Starting Member
4 Posts |
Posted - 2002-03-14 : 12:30:42
|
| Ahh, but they are not always comma separated values and sometimes there are 3 elements or 2.for example:record 1 Ottawa, On, Canadarecord 2 Toronto, On CArecord 3 Halifax, NSRecord 4 Charlotte, P.E.I. CanadaRecord 5 Orlando FloridaThe program has to be able to take into consideration all of these. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-03-14 : 13:08:12
|
You could play with the string and use the PARSENAME function to pull out each piece:1 Ottawa, On, Canada2 Toronto, On CA3 Halifax, NS4 Charlotte, P.E.I. Canada5 Orlando Florida SELECT PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)FROM Table--returns1 Canada2 CA3 NS4 Canada5 Florida Now, you have some concerns about when they include a country code and when they don't. You can use PARSENAME to parse up to 4 sections and test each piece (if it's empty, parsename will return Null). As ugly as this code will look, it will at least let you operate on the entire set in one operation:SELECT CityProv,CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))WHEN 1 THEN PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 2)WHEN 2 THENPARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 3)WHEN 3 THENPARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 4)ELSE Null END AS City,CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))WHEN 1 THEN PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)WHEN 2 THEN PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 2)WHEN 3 THENPARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 3)ELSE Null END AS StateProv,CASE Len(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'))-Len(Replace(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), '.', ''))WHEN 2 THEN PARSENAME(Replace(Replace(Replace(CityProv, '.', ''), ',', ''), ' ', '.'), 1)ELSE Null END AS CountryFROM Name_0102testNow, before you throw up , see if that SELECT statement will work (the one in red, just copy it and paste it into query analyzer). You should get 4 columns: CityProv, City, State/Prov, and Country. I'm curious to see if this works because I can't really test it (I'll try it with some generated data, but it might take me some time).The one problem I anticipate is if you have something like "St. George, NS, Canada" or another instance where the city is 2 or more words. I might have an idea on how to handle this, but I need to see if the above works first.Let me know! HTHThis has been tested and the statement above was corrected!I managed to get exactly the results I anticipated, so it does appear to work! Give it a try!Edited by - robvolk on 03/14/2002 13:23:00 |
 |
|
|
RaistlinM
Starting Member
4 Posts |
Posted - 2002-03-14 : 15:05:34
|
| Soooo close, but it ran into problems with this type of infoOrig--------------------City-----prov--------countryNEPEAN,ON CA -->___NEPEANON___CA__________NULLBARRIE ON CAN -->___BARRIE_____NULL________NULLKANATA ON -->___KANATA_____NULL_______ONAs far as cities such as thisNew YorkI am curently using this as a revised sp_LeaveSpaces:CREATE PROCEDURE sp_LeaveSpaces ( --The string to have all commas and periods removed @input varchar(8000), @output varchar(8000) OUTPUT ) AS DECLARE @position as int, @checker as int, @checked as int, @position2 as int Begin Set @position = 1 Set @checker = 999 Set @checked = 0 Set @position2 = 1 while (@position <> 0) Begin Set @position = charindex(',',@input) if @position = @checker+1 begin set @input = stuff(@input,@position,1,' zzzz ') set @checked =1 End Set @checker = @position If @checked = 0 begin if @position <> 0 set @input = stuff(@input,@position,1,' ') end set @checked=0 While (@position2 <> 0) Begin Set @position2 = charindex('.',@input) if @position2 <> 0 set @input = stuff(@input,@position2,1,' ') End End Set @output = @input End Thank you very much for all your help :)Edited by - RaistlinM on 03/14/2002 15:13:19 |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-03-14 : 15:17:56
|
| Here's a question for you: are you cleaning up data in order to put it into a new system, or will there continue to be new data entered in this fashion? I'd suggest that if you need to separate city, state and country that you redesign your table(s) to have separate columns for each. It's a losing battle to continue to clean up data that will keep being entered in this fashion.If you ARE changing the structure, if this gives you 90-95% correct results, can you move the data to the new structure and manually fix any that it misses? I'm concerned that you'll spend a lot of time and effort on this and get frustrated with the results. |
 |
|
|
RaistlinM
Starting Member
4 Posts |
Posted - 2002-03-15 : 09:19:18
|
The data comes this way from a file. :(I will probably leave it with the slower version since it is more accurate with the results and have the procedures schedualed at late hours so process time does not matter. :)Thank you for all your help.I just created a new table with the same fields that i was working with and had the parsed information inserted into it. Once my while loop finished inserting all the data I run an update of the main table with an inner join with my temporary one and this cut my program's execution time from 24 minutes to 1 minute 10 seconds. Turns out all the updates i was doing in the loop was slowing it down. Since insert just sticks the information into a new record it sped things up drastically.Thanks for all your help. Edited by - RaistlinM on 03/15/2002 10:47:08 |
 |
|
|
|
|
|