Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-01 : 20:00:36
|
Hey guys, I was wondering if anyone can give me an idea how I might resolve a problem of finding the postal code of some orphan telephone numbers. Essentially I need to determine the *most* common occurance of postcode for a given database of phone numbers. My dataset is quite comprehensive, around 2m phone numbers, and I have postcodes for around 60% of the records. I'm reasoning that where I have the data as :-PHONE,POSTCODE+61290201111,2000+61290202222,2000+61290203333,2000+61290204444,2000+61290205555,2416Therefore the most common postcodefor the prefix +6129020 is 2000, and I just want to store that in a table, so I can use that to classify my other records which don't have postcodes. (I'm assuning that some records have incorrect postcode records, hence the reason I want the most common one)I've tried a GROUP BY query :-select count(*), postcode, left(phone,8) from maindatawhere phone is not null and postcode is not nullgroup by postcode, left(phone,8) order by left(phone,8), count(*) desc However this just lists all the permutations. I'm quite new to SQL, and suspect that the MAX function might help me, but cant see how to structure it. Any help greatly appreciated. |
|
|
slimt_slimt
Aged Yak Warrior
746 Posts |
Posted - 2010-08-02 : 02:00:21
|
try something like this if i understand your problem correctly:select count(*) as nof_occ ,postcodefrom maindatawhere phone is not nulland postcode is not nullgroup by postcode-- or try thisselectsum(nof_occ) as nof,postcode,left(phone,8)from(select count(*) as nof_occ ,postcodefrom maindatawhere phone is not nulland postcode is not nullgroup by postcode) as x join maindata as m on m.postcode = x.postcodegroup by postcode ,left(phone,8) |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 05:08:54
|
| Hey Slim, Thanks very much for taking the time to reply. I don't know if its because I'm running MS-SQL server, but I couldn't get the second example to work. The first error was :-Msg 209, Level 16, State 1, Line 18Ambiguous column name 'postcode'.Msg 209, Level 16, State 1, Line 3Ambiguous column name 'POSTCODE'So I tried running the SELECT part on its own, as :-(select count(*) as nof_occ ,postcode from ozmedia.dbo.maindatawhere phone is not nulland postcode is not nullgroup by postcode) as x join dbo.maindata as m on m.postcode = x.postcodebut it didn't like the AS X part. Any ideas? |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 05:18:30
|
| I played around with it a little more, and changed the postcode to x.postcode as :-select sum(nof_occ) as nof,x.POSTCODE,left(phone,8)from(select postcode,count(*) as nof_occfrom maindatawhere phone is not nulland postcode is not nullgroup by postcode) as x join maindata as m on m.postcode = x.postcodegroup by x.POSTCODE,left(phone,8)The query started to run, but came up with the error :-Msg 8115, Level 16, State 2, Line 1Arithmetic overflow error converting expression to data type int.If it helps, the phone number field is varchar(50) and the postcode is an INT |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 05:36:44
|
| If it helps, I've been able to create a table that has the data in the following format :-phone postcode pcount+6124465 2428 2+6124465 2537 3+6124465 2535 11+6124465 2577 191+6124471 2546 2+6124471 2229 7+6124471 2536 323+6124472 2753 1+6124472 2601 3+6124472 2550 5+6124472 2537 22+6124472 2536 2013+6124473 2800 1+6124473 2536 2+6124473 2630 2+6124473 2545 100+6124473 2546 140+6124473 2537 245+6124474 4207 2+6124474 2541 3+6124474 2536 99+6124474 2537 990But what I really want to get out of it is something like this :-+6124465 2577 191+6124471 2536 323+6124472 2536 2013+6124473 2537 245+6124474 2537 990Where the above PHONE,POSTCODE combination has the highest PCOUNT |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-08-02 : 06:16:05
|
Dunno if it can be done in one pass, but this should at least do the trickSELECT phone, postcode, [T_RowCount]FROM( SELECT [T_RowNumber] = ROW_NUMBER() OVER ( PARTITION BY phone ORDER BY T_RowCount DESC ), * FROM ( SELECT [T_RowCount] = COUNT(*) OVER ( PARTITION BY phone, postcode ), phone, postcode FROM MyTable ) AS X) AS TWHERE T_RowNumber = 1ORDER BY phone |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 06:44:51
|
| Hey Kirsten, Thanks for the input :) I tried that, and it returned the right number of rows, but unfortunately it didn't seem to find the highest value of pcount so had the wrong postcodes returned, if u see what I mean. IE for phone postcode pcount+6124735 2750 1273should have been the resultbut the query produced +6124735 2760 1The data for this phone number looked like :-phone postcode (No column name)+6124735 2800 1+6124735 2428 2+6124735 2747 2+6124735 2756 2+6124735 2760 2+6124735 2773 2+6124735 2145 3+6124735 2740 3+6124735 2753 3+6124735 2774 3+6124735 2777 3+6124735 2749 5+6124735 2751 6+6124735 2750 1273So its kinda picked one in the middle of the list which is odd ... any ideas ? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-08-02 : 06:59:51
|
| left(phone,8)missed that bit, sorry. You'll need to put that in the PARTITION statements (both of them) |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 07:03:23
|
| Hi Kristen...Tried that but still no tamales. :) The phone field actually only has 8 digits in this table |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-08-02 : 07:13:05
|
| Post your query so we can check/review it please |
 |
|
|
fuzznucket
Starting Member
7 Posts |
Posted - 2010-08-02 : 07:33:49
|
| Hi Kristen, This is the query ...SELECT phone, postcode, [T_RowCount]FROM( SELECT [T_RowNumber] = ROW_NUMBER() OVER ( PARTITION BY left(phone,8) ORDER BY T_RowCount desc ), * FROM ( SELECT [T_RowCount] = COUNT(*) OVER ( PARTITION BY left(phone,8), postcode ), phone, postcode FROM [phonecounts] ) AS X) AS TWHERE T_RowNumber = 1ORDER BY phoneHowever, I think I might have sussed it another way. Its a bit of a hack, but as long as the table is ordered the correct way (phone,postcode,count) ascending, which it is, I just used ALTER to ADD an ID field to the table, which is unique, then the following query, and it worked. SELECT id,phone,postcode,pcount FROM [phonecounts] where id in (select MAX(id) from [phonecounts] group by phone) |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-08-02 : 08:14:19
|
My approach was not using a table already containing counts, but rather just the raw data (multiple rows for the same Phone / postcode).If you want to use a table that already has Counts then you don't need the inner layerSELECT phone, postcode, [T_RowCount]FROM( SELECT [T_RowNumber] = ROW_NUMBER() OVER ( PARTITION BY phone, postcode ORDER BY phone, postcode, pcount DESC ), phone, postcode FROM MyTable) AS TWHERE T_RowNumber = 1ORDER BY phone or (if the table has Counts and DUPLICATES on Phone Number / postcode combination then use SUM() instead of of COUNT())... FROM ( SELECT [T_RowCount] = SUM(pcount) OVER ( PARTITION BY phone, postcode ), phone, postcode FROM MyTable ) AS X... |
 |
|
|
|
|
|
|
|