Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Identifying and fixing pattern in address field

Author  Topic 

ekaiser
Starting Member

3 Posts

Posted - 2015-03-26 : 11:58:24
How can i scan a Address field and fix repeating patterns such as this? The suite numbers at the end repeat themselves .. sometimes several times
506 N Kenneth J Expressway Ste 1 Ste 1 Ste 1 Ste 1
2678 NC 89 E Highway Ste 1 Ste 1
313 Avenue W S Te Ste 1 Ste 1
579 W Heritage Pk Boulevard Ste 1 Ste 1
6260 W Indian School Rd Ste 1 Ste 1
N113W1528 Montgomery Dr Apt 2 Apt 2 Apt 2
W201N1656 Hemlock St Apt 2 Apt 2 Apt 2
W201N1656 Hemlock St Apt 2 Apt 2 Apt 2
10549 Valparaiso St Apt 2 Apt 2 Apt 2
655 Walnut St Apt 2 Apt 2 Apt 2
48-425 Kilauea Ave Apt 2 Apt 2 Apt 2
N8961 Holmes Rd Apt 2 Apt 2 Apt 2
9390 Ben C Pratt 6 Mill Cypr Ste 2 Ste 2
235 Blaine Harbor Mar Gate Ste 2 Ste 2
2100 Martin Luther Jr Way Ste 2 Ste 2
600 N Wolfe Street Park Ste 2 Ste 2

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-03-26 : 12:39:44
how many repeating patterns are there? e.g. there are two in your sample data. "Ste n" and "Apt n"
How many badly formatted rows do you have?
Go to Top of Page

ekaiser
Starting Member

3 Posts

Posted - 2015-03-27 : 12:00:09
about 30,000 and it could be FL FL or STE STE or a variety of values in the pattern
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-03-27 : 12:52:46
Using your data, I handled one case:


declare @ table (baddata varchar(8000))
insert into @ values
('506 N Kenneth J Expressway Ste 1 Ste 1 Ste 1 Ste 1'),
('2678 NC 89 E Highway Ste 1 Ste 1 '),
('313 Avenue W S Te Ste 1 Ste 1 '),
('579 W Heritage Pk Boulevard Ste 1 Ste 1 '),
('6260 W Indian School Rd Ste 1 Ste 1 '),
('N113W1528 Montgomery Dr Apt 2 Apt 2 Apt 2 '),
('W201N1656 Hemlock St Apt 2 Apt 2 Apt 2 '),
('W201N1656 Hemlock St Apt 2 Apt 2 Apt 2 '),
('10549 Valparaiso St Apt 2 Apt 2 Apt 2 '),
('655 Walnut St Apt 2 Apt 2 Apt 2 '),
('48-425 Kilauea Ave Apt 2 Apt 2 Apt 2 '),
('N8961 Holmes Rd Apt 2 Apt 2 Apt 2 '),
('9390 Ben C Pratt 6 Mill Cypr Ste 2 Ste 2 '),
('235 Blaine Harbor Mar Gate Ste 2 Ste 2 '),
('2100 Martin Luther Jr Way Ste 2 Ste 2 '),
('600 N Wolfe Street Park Ste 2 Ste 2 ')

update @
set baddata = rtrim(baddata)

select baddata, substring(baddata, 1,n-1) gooddata
from @
cross apply (select patindex('%ste %', baddata)) _1(ste)
cross apply (select ste + patindex('%[0-9]%', substring(baddata, _1.ste, len(baddata)))) _2(n)
cross apply (select n + patindex('%[^0-9]%', substring(baddata, n, len(baddata)))) _3(not_n)
where baddata like '%ste%ste%'


However, you will need to do this once per bad pattern. Also what if you have an address like:

'1 main st fl fl 1'

That is FL as in Florida and FL again as in FLOOR?
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2015-03-27 : 15:50:07
Honestly, I'd consider calling Google's address engine (at least they used to provide a public interface to call it) or someone else's and have this done for you. Parsing and correcting addresses is overall an extremely complex task.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-03-27 : 15:56:07
For double repeating patterns at the end of a field you could check for " XX" at the end (i.e. "space and 2 letters") where those 3-characters exactly matches the same 3-characters twice at the end of the address line. Then repeat for " XXX" (Space and 3 letters) and so on.

SELECT *
FROM MyTable
WHERE AddressLine LIKE '% [A-Za-z0-9][A-Za-z0-9]' -- Ends in " XX"
AND RIGHT(AddressLine, 6) = RIGHT(AddressLine, 3) + RIGHT(AddressLine, 3)

Run the code multiple times to remove triple / quadruple duplications at the end of the line.

Whatever you do there will probably be legitimate duplicate-ending addresses ... "New York New York" like the song
Go to Top of Page
   

- Advertisement -