Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 SQL Query to determine postcodes

Author  Topic 

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-01 : 20:00:36
Hey guys,

I was wondering if anyone can give me an idea how I might resolve a problem of finding the postal code of some orphan telephone numbers.

Essentially I need to determine the *most* common occurance of postcode for a given database of phone numbers.

My dataset is quite comprehensive, around 2m phone numbers, and I have postcodes for around 60% of the records.

I'm reasoning that where I have the data as :-

PHONE,POSTCODE
+61290201111,2000
+61290202222,2000
+61290203333,2000
+61290204444,2000
+61290205555,2416

Therefore the most common postcodefor the prefix +6129020 is 2000, and I just want to store that in a table, so I can use that to classify my other records which don't have postcodes. (I'm assuning that some records have incorrect postcode records, hence the reason I want the most common one)

I've tried a GROUP BY query :-

select count(*), postcode, left(phone,8) from maindata
where phone is not null and postcode is not null
group by postcode, left(phone,8)
order by left(phone,8), count(*) desc


However this just lists all the permutations.

I'm quite new to SQL, and suspect that the MAX function might help me, but cant see how to structure it.

Any help greatly appreciated.



slimt_slimt
Aged Yak Warrior

746 Posts

Posted - 2010-08-02 : 02:00:21
try something like this if i understand your problem correctly:


select
count(*) as nof_occ
,postcode
from maindata
where phone is not null
and postcode is not null
group by postcode

-- or try this

select
sum(nof_occ) as nof
,postcode
,left(phone,8)
from
(
select
count(*) as nof_occ
,postcode
from maindata
where phone is not null
and postcode is not null
group by postcode
) as x
join maindata as m
on m.postcode = x.postcode

group by
postcode
,left(phone,8)
Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 05:08:54
Hey Slim,

Thanks very much for taking the time to reply. I don't know if its because I'm running MS-SQL server, but I couldn't get the second example to work.

The first error was :-

Msg 209, Level 16, State 1, Line 18
Ambiguous column name 'postcode'.
Msg 209, Level 16, State 1, Line 3
Ambiguous column name 'POSTCODE'

So I tried running the SELECT part on its own, as :-


(
select
count(*) as nof_occ
,postcode
from ozmedia.dbo.maindata
where phone is not null
and postcode is not null
group by postcode
) as x
join dbo.maindata as m
on m.postcode = x.postcode

but it didn't like the AS X part.

Any ideas?
Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 05:18:30
I played around with it a little more, and changed the postcode to x.postcode as :-

select sum(nof_occ) as nof
,x.POSTCODE,left(phone,8)
from
(
select
postcode,count(*) as nof_occ
from maindata
where phone is not null
and postcode is not null
group by postcode
) as x
join maindata as m
on m.postcode = x.postcode

group by
x.POSTCODE,left(phone,8)

The query started to run, but came up with the error :-

Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.

If it helps, the phone number field is varchar(50) and the postcode is an INT

Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 05:36:44
If it helps, I've been able to create a table that has the data in the following format :-

phone postcode pcount
+6124465 2428 2
+6124465 2537 3
+6124465 2535 11
+6124465 2577 191
+6124471 2546 2
+6124471 2229 7
+6124471 2536 323
+6124472 2753 1
+6124472 2601 3
+6124472 2550 5
+6124472 2537 22
+6124472 2536 2013
+6124473 2800 1
+6124473 2536 2
+6124473 2630 2
+6124473 2545 100
+6124473 2546 140
+6124473 2537 245
+6124474 4207 2
+6124474 2541 3
+6124474 2536 99
+6124474 2537 990

But what I really want to get out of it is something like this :-

+6124465 2577 191
+6124471 2536 323
+6124472 2536 2013
+6124473 2537 245
+6124474 2537 990

Where the above PHONE,POSTCODE combination has the highest PCOUNT


Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-02 : 06:16:05
Dunno if it can be done in one pass, but this should at least do the trick

SELECT phone,
postcode,
[T_RowCount]
FROM
(
SELECT [T_RowNumber] = ROW_NUMBER()
OVER
(
PARTITION BY phone
ORDER BY T_RowCount DESC
),
*
FROM
(
SELECT [T_RowCount] = COUNT(*)
OVER
(
PARTITION BY phone, postcode
),
phone, postcode
FROM MyTable
) AS X
) AS T
WHERE T_RowNumber = 1
ORDER BY phone
Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 06:44:51
Hey Kirsten,
Thanks for the input :)
I tried that, and it returned the right number of rows, but unfortunately it didn't seem to find the highest value of pcount so had the wrong postcodes returned, if u see what I mean.

IE for
phone postcode pcount
+6124735 2750 1273

should have been the result

but the query produced
+6124735 2760 1

The data for this phone number looked like :-

phone postcode (No column name)
+6124735 2800 1
+6124735 2428 2
+6124735 2747 2
+6124735 2756 2
+6124735 2760 2
+6124735 2773 2
+6124735 2145 3
+6124735 2740 3
+6124735 2753 3
+6124735 2774 3
+6124735 2777 3
+6124735 2749 5
+6124735 2751 6
+6124735 2750 1273

So its kinda picked one in the middle of the list which is odd ... any ideas ?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-02 : 06:59:51
left(phone,8)

missed that bit, sorry. You'll need to put that in the PARTITION statements (both of them)
Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 07:03:23
Hi Kristen...

Tried that but still no tamales. :)

The phone field actually only has 8 digits in this table
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-02 : 07:13:05
Post your query so we can check/review it please
Go to Top of Page

fuzznucket
Starting Member

7 Posts

Posted - 2010-08-02 : 07:33:49
Hi Kristen,

This is the query ...

SELECT phone,
postcode,
[T_RowCount]
FROM
(
SELECT [T_RowNumber] = ROW_NUMBER()
OVER
(
PARTITION BY left(phone,8)
ORDER BY T_RowCount desc
),
*
FROM
(
SELECT [T_RowCount] = COUNT(*)
OVER
(
PARTITION BY left(phone,8), postcode
),
phone, postcode
FROM [phonecounts]
) AS X
) AS T
WHERE T_RowNumber = 1
ORDER BY phone


However, I think I might have sussed it another way. Its a bit of a hack, but as long as the table is ordered the correct way (phone,postcode,count) ascending, which it is, I just used ALTER to ADD an ID field to the table, which is unique, then the following query, and it worked.

SELECT id,phone,postcode,pcount
FROM [phonecounts]
where id in
(select MAX(id) from [phonecounts]
group by phone)

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-02 : 08:14:19
My approach was not using a table already containing counts, but rather just the raw data (multiple rows for the same Phone / postcode).

If you want to use a table that already has Counts then you don't need the inner layer

SELECT phone,
postcode,
[T_RowCount]
FROM
(
SELECT [T_RowNumber] = ROW_NUMBER()
OVER
(
PARTITION BY phone, postcode
ORDER BY phone, postcode, pcount
DESC
),
phone, postcode
FROM MyTable
) AS T
WHERE T_RowNumber = 1
ORDER BY phone


or (if the table has Counts and DUPLICATES on Phone Number / postcode combination then use SUM() instead of of COUNT())

...
FROM
(
SELECT [T_RowCount] = SUM(pcount)
OVER
(
PARTITION BY phone, postcode
),
phone, postcode
FROM MyTable
) AS X
...
Go to Top of Page
   

- Advertisement -