Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 02:44:38
|
| I have a table with duplicate records for each category type.For example, the name 'John' can occur twice in category A and three times in Category B and o times in Category C. Similarily, Jane can exist in Category B twice and maybe 3 times in A and 2 times in C, but I count it as one again. So basically, I need to see how many names in One category exist in the other category and report it as a a matrix below. Can anyone help? A B CA 1 0B 1 3C 1 2 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 04:08:05
|
| [code]--sample dataselect * into #Tempfrom(select 'john' as nam,'A' as cat union allselect 'philip' as nam,'A' as cat union allselect 'john' as nam,'C' as cat union allselect 'mary' as nam,'A' as cat union allselect 'mary' as nam,'B' as cat union allselect 'philip' as nam,'D' as cat union allselect 'mary' as nam,'D' as cat union allselect 'john' as nam,'B' as cat)tselect c.cat,sum(r.A) AS A,sum(r.B) AS B,sum(r.C) AS C,sum(r.D) AS Dfrom (select cat,nam from #Temp)ccross apply(select *from(select cat,namfrom #Tempwhere nam=c.nam)mpivot (count(nam) for cat in ([A],[B],[C],[D]))p)rgroup by c.catdrop table #Tempoutput-------------------------------------------cat A B C DA 3 2 1 2B 2 2 1 1C 1 1 1 0D 2 1 0 2[/code] |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 09:45:11
|
| Thanks Visakh. That is a great solution. One more question on this.. if I were to extend this to include more fields besides just name, how can I do that? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 09:50:16
|
| what all counts you want to include? show some sample data for that |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 09:56:41
|
| I have FirstName, LastName, Address1, Address2, City, State and Zip for each of the categories. I would like to do what you have done with just the name to include these other fields as well. Please ask if this does not make sense |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 10:00:39
|
| show your reqmt by means of some sample data please |
 |
|
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2009-02-20 : 10:01:15
|
| He is referring to this one:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=120177 |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 10:09:38
|
| No That one is different SODEEP.The sample data looks like thisselect * into #Tempfrom(select 'JOHN' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'ABC' as CATEGORY union ALLselect 'JOHN' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'ABC' as CATEGORY union allselect 'JOHN' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'BCD' as CATEGORY union allselect 'JANE' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'ABD' as CATEGORY union allselect 'JANE' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'DEF' as CATEGORY union allselect 'SAM' as FIRSTNAME, 'WOO' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'ABD' as CATEGORY union allselect 'SAM' as FIRSTNAME, 'WOO' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'DEF' as CATEGORY union allselect 'JOHN' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'FIG' as CATEGORY union allselect 'JANE' as FIRSTNAME, 'DOE' AS LASTNAME, 'ADDRESS1' AS ADDRESS1, 'ADDRESS2' AS ADDRESS2, 'STATE' AS STATECODE,'CITY' AS CITY,'12345' AS ZIP,'DEF' as CATEGORY )tAnd the output should be exactly like you had above. Please note that in the above sample, though John Doe exists twice in Category ABC, we only count him as 1 for that category |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 10:14:29
|
| would you mind posting sample of output you want? i'm not sure what all columns you expect |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 10:25:02
|
| category ABC BCD ABD DEF FIGABC 2 1 2 3 4BCD 1 3 1 4 2ABD 3 0 1 2 6DEF 0 1 2 2 1FIG 2 4 0 1This is just a sample output. These are not correct numbers based on my example. Also I ran your solution on the real data (190K rows) and it is still going after about 9 minutes |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 10:28:51
|
try like belowselect c.cat,sum(r.A) AS A,sum(r.B) AS B,sum(r.C) AS C,sum(r.D) AS Dfrom (select cat,FirstName, LastName, Address1, Address2, City, State,Zip from #Temp)ccross apply(select *from(select cat,namfrom #Tempwhere FirstName=c.FirstNameand LastName=c.LastNameand Address1=c.Address1and Address2=c.Address2and City=c.Cityand State=c.Stateand Zip=c.Zip)mpivot (count(nam) for cat in ([A],[B],[C],[D]))p)rgroup by c.cat |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 10:29:30
|
| Please ignore my last comment about time. I created indexes on my temp table which increased the speed massively (a few seconds) |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 11:37:15
|
| I get the desired result, but seems confusing.I have 100K records for a certain category, but when I apply this query, I only get about 37K for that under it's own nameIs that a count of duplicates within the same query or is it the count of total rows? I am confused. Please helpFor example,CATEGORY ABCABC 37000 |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 11:40:58
|
| And when I eliminate some of the columns, then I get more than 100k for that category, although there are only 100K total records for that category. for exampleCategory ABCABC 120000 |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 11:44:07
|
is this any better?select c.cat,sum(r.A) AS A,sum(r.B) AS B,sum(r.C) AS C,sum(r.D) AS Dfrom (select cat,FirstName, LastName, Address1, Address2, City, State,Zip from #Temp)ccross apply(select *from(select distinct cat,namfrom #Tempwhere FirstName=c.FirstNameand LastName=c.LastNameand Address1=c.Address1and Address2=c.Address2and City=c.Cityand State=c.Stateand Zip=c.Zip)mpivot (count(nam) for cat in ([A],[B],[C],[D]))p)rgroup by c.cat |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 11:50:06
|
| No that didnt help. If I use all the fields to compare, it eliminates a lot of the categories. If I only use a few fields, it brings more categories back but the counts are wrong.If you like, I can email you my actual table? |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 11:53:59
|
| Visakh, would you like to remote into my PC? That will be easier to troubleshoot. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 11:59:12
|
quote: Originally posted by DeveloperIQ No that didnt help. If I use all the fields to compare, it eliminates a lot of the categories. If I only use a few fields, it brings more categories back but the counts are wrong.If you like, I can email you my actual table?
ok. what about this?select c.cat,sum(r.A) AS A,sum(r.B) AS B,sum(r.C) AS C,sum(r.D) AS Dfrom (select distinct cat,FirstName, LastName, Address1, Address2, City, State,Zip from #Temp)ccross apply(select *from(select distinct cat,namfrom #Tempwhere FirstName=c.FirstNameand LastName=c.LastNameand Address1=c.Address1and Address2=c.Address2and City=c.Cityand State=c.Stateand Zip=c.Zip)mpivot (count(nam) for cat in ([A],[B],[C],[D]))p)rgroup by c.cat |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 12:04:08
|
| No that didnt make any difference either. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-02-20 : 12:06:30
|
| for the ones which bring more value either use distinct or group by to make them 1 per row. also it would be helpful if you can post some data to show what you mean by multiples |
 |
|
|
DeveloperIQ
Yak Posting Veteran
71 Posts |
Posted - 2009-02-20 : 12:17:08
|
| With TITLE, FILETYPE, FIRSTNAME, LASTNAME, ADDRESS1, ADDRESS2, CITY, STATE, POSTALCODE fields, the output isFILETYPE PPO SIW FMS SEW GEJ PAK AGC LUI LED VAD KDB ZRP RPS PAC CGW CPI MONFMS 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0LUI 0 0 0 0 0 0 0 37347 0 3 0 0 0 0 0 0 0PAK 0 0 0 0 0 191 0 0 0 0 0 0 0 0 0 0 0SIW 0 45 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0VAD 0 1 0 0 0 0 0 3 0 99 0 0 0 0 0 0 0When I add more fields, I only get one row of data. Some sample data looks like thisACCOUNTNO TITLE FIRSTNAME LASTNAME SUFFIX ADDRESS1 ADDRESS2 ADDRESS3 ADDRESS4 ADDRESS5 CITY STATE POSTALCODE COUNTRY GENDER BIRTHDATE PHONE EMAIL PURCHASETYPE FILETYPENULL Mr W Davidson NULL 28 Cairnside Cults NULL NULL NULL Aberdeen NULL AB15 9NZ UK NULL NULL NULL NULL TRAVEL CGWNULL Mr Stewart I Greig NULL 168 Oldcroft Place NULL NULL NULL NULL Aberdeen NULL AB16 5UJ UK NULL NULL NULL NULL TRAVEL CGWNULL Mr Raymond Smith NULL 8 Cruden Place NULL NULL NULL NULL Aberdeen NULL AB16 7RJ UK NULL NULL NULL NULL TRAVEL CGWNULL Mr Bill Campbell NULL Strathlene Meldrum Drive Newmachar Aberdeen NULL NULL Aberdeenshire NULL AB21 0PH UK NULL NULL NULL NULL TRAVEL CGWNULL Mr Leslie McBain NULL 23 Valentine Drive Danestone NULL NULL NULL Aberdeen NULL AB22 8YF UK NULL NULL NULL NULL TRAVEL CGWNULL Mr Al Reco NULL 27 Bayview Court NULL NULL NULL NULL Aberdeen NULL AB24 1WA UK NULL NULL NULL NULL TRAVEL CGWNULL Ms Dorothy Christie NULL 65 Formartine Road NULL NULL NULL NULL Aberdeen NULL AB24 2QX UK NULL NULL NULL NULL TRAVEL CGWNULL Mrs E Shanol NULL 166 Hilton Drive NULL NULL NULL NULL Aberdeen NULL AB24 4LQ UK NULL NULL NULL NULL TRAVEL CGWNULL Mrs M Watson NULL 198 Westburn Road NULL NULL NULL NULL Aberdeen NULL AB25 2LT UK NULL NULL NULL NULL TRAVEL CGWNULL Mr A Bennett NULL 17 Highfield Avenue NULL NULL NULL NULL Banchory NULL AB31 4FB UK NULL NULL NULL NULL TRAVEL CGW |
 |
|
|
Next Page
|
|
|
|
|