Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Postcode Cleanup

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2010-12-16 : 11:22:49
We are getting a number of failures in a data integration process to a 3rd party system because the Customer's post code is invalid.

We use a Post Code Lookup service to validate the postcode - the user types in their postcode, the system offers them suitable addresses and they pick one. (In the UK the postcode references usually one side of the street - so is usually unique to 50 houses, or less)

However, the postcode system always lags a bit behind reality, and has some errors in it, so we do need to allow the Customer to override the system - and that's where my problem is.

Going through today's failures with our Client I was surprised at the failures which are basically Zero instead of Letter-Oh and the like.

This Should be
======== ========
0X1 1AB OX1 1AB - Zero / letter-Oh
AB9 !AB AB9 1AB - Caps lock on in readiness for typing the second part
AB12AB AB1 2AB - Space missing

Sadly the UK postcode system has rather a lot of variations including

AA1 1AA
AA11 1AA
A1A 1AA

and I'm not 100% sure that the second part always uses a "1AA" format even

The first two letters are from a relatively small set (codes for postal towns) so I could make a stab at fixing them, and I could try enforcing characters in the last two positions by silently substituting numbers-to-letters

I then thought I could have a go at numeric bits - e.g. where SHIFT was on when typing the digit

My plan was to do this before the data is passed to the Post Code Database in the hope that instead of getting back a "Huh?" the user would likely see their actual address magically appear and then they would pick it and it would be perfectly formed [;(]

Any suggestions for mechanical methods of fixing these sorts of letter/digit and caps-lock typos would be appreciated.

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2010-12-16 : 11:42:50
quote:
Originally posted by Kristen
Going through today's failures with our Client I was surprised at the failures which are basically Zero instead of Letter-Oh and the like.
Sadly the UK postcode system has rather a lot of variations including

Any suggestions for mechanical methods of fixing these sorts of letter/digit and caps-lock typos would be appreciated.



I had a section in my book "Data, Measurements and Standards in SQL" on the nightmare that is the UK postal code. I was so happy to see that you are getting a 5-digit MailSort system for bulk mailers.

You can find regualr expressions for the code at:

http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation



--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2010-12-16 : 12:09:46
quote:
Originally posted by jcelko

You can find regualr expressions for the code at:

http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation



Jcelko, thank you for that link. Not that I have any interest in UK postal codes, but it makes very interesting reading. For instance, it says "Completely accurate validation is only possible by attempting to deliver mail to the address, and verifying with the recipient."

And that brought to mind a picture of Kristen trying to deliver mail to an address in London in the middle of winter.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-16 : 12:33:36
Thanks Joe - I had hunted high & low on the post OFfice's web site for a clear definition - but they charge for their database so I suppose have no incentive to actually describe the format anywhere - I hadn't thought to look on Wikipedia.

I'm thinking of having some patterns for likely incorrect values and leaving it at that. I might log things that fail the RegEx to see what sorts of values people are attempting to use, and "improve" the autoMagic fix over time.

But there is no point me just imposing a RegEx on the Customers because I don't suppose they will understand why their data entry is being rejected if it "l00ks 0K"

The formats available appear to be limited to:

A9 9AA
A99 9AA
AA9 9AA
AA99 9AA
A9A 9AA
AA9A 9AA
GIR 0AA
BFPO 9[999]


There is a list of valid leading A & AA codes - but it has 120 valid values, so not exactly easy to guess what typos might have been intended to be - particularly as

A99 9AA
might be a typo for
AA9 9AA
and vice-versa

For postcodes A9X 9AA the "X" does not include the letters "ILOQZ" - which were presumably excluded as being similar to digits - so that may help autoMagic substitutions.

In the second part AA9 9XX the "XX" does not include the letters "CIKMOV" - wonderful!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-16 : 12:39:13
quote:
Originally posted by sunitabeck

it says "Completely accurate validation is only possible by attempting to deliver mail to the address, and verifying with the recipient."


Although I'm not sure that is fair ... there is a Post Code Loopkup database available (for money) so most UK websites allow you to type in your postcode and then you pick your house name / number from a list and your whole address is filled in - which is pretty convenient for call centres and websites alike ....

The problem is that the service we use (which in all other ways is excellent) doesn't return anything useful if you try to tell it your postcode is 0X1 1AB (instead of OX1 1AB)
Go to Top of Page
   

- Advertisement -