Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Group by column A but compare column B

Author  Topic 

GarDebs2010
Starting Member

4 Posts

Posted - 2012-09-27 : 15:25:10
This one has had me stumped for the last few hours and at this stage I think I need some help...

I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana
I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-27 : 15:43:47
Probably not the most elegant or most efficient, (and probably a little silly) but here is one way:
CREATE TABLE #tmp (cola VARCHAR(32), colb VARCHAR(32))

INSERT INTO #tmp VALUES
('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana');

;WITH cte AS
(
SELECT
a.cola,
b.colbs
FROM
(SELECT DISTINCT cola FROM #tmp) a
CROSS APPLY
(
SELECT colb AS [text()] FROM #tmp c
WHERE c.cola = a.cola
ORDER BY c.colb
FOR XML PATH('')
) b(colbs)
)
SELECT
a.cola,
b.cola
FROM
cte a INNER JOIN cte b ON a.colbs = b.colbs AND a.cola <> b.cola
AND a.cola < b.cola;

DROP TABLE #tmp;
Go to Top of Page

bitsmed
Aged Yak Warrior

545 Posts

Posted - 2012-09-27 : 16:34:49
Alternative to sunitabeck's solution

select a.col_a
from tablename as a
left outer join tablename as b
on b.col_b=a.col_b
and b.col_a<>a.col_a
group by a.col_a
having sum(case when b.col_b is null then 1 else 0 end)=0


Edit: my above sql attempt, did only work with the supplied test data, so I have rewritten the sql to fit all scenarios (I hope)

select col_a
from (select col_a
,count(col_b) col_b_count
from tablename
group by col_a
) as a
inner join (select a.col_a as col_a1
,b.col_a as col_a2
,count(b.col_b) col_b_count
from tablename as a
inner join tablename as b
on b.col_a<>a.col_a
and b.col_b=a.col_b
group by a.col_a
,b.col_a
) as b
on b.col_a1=a.col_a
where a.col_b_count=b.col_b_count
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-09-27 : 17:37:45
quote:
Originally posted by GarDebs2010

This one has had me stumped for the last few hours and at this stage I think I need some help...

I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana
I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G



declare @test table
(
ColA varchar(100),
ColB varchar(100)
)
insert @test
values('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana')


;With Temp
AS
(
SELECT ColA,
ColB,
STUFF ((SELECT ','+ ColB FROM @test WHERE ColA = t.ColA ORDER BY ColB FOR XML PATH('')),1,1,'') AS ColBList
FROM @test t
)

SELECT t.ColA,t.ColB
FROM Temp t
INNER JOIN (SELECT ColBList
FROM Temp
GROUP BY ColBList
HAVING COUNT(DISTINCT ColA) > 1
)t1

ON t1.ColBList = t.ColBList



output
----------------------------
ColA ColB
----------------------------
John Apple
John Orange
John Banana
David Apple
David Orange
David Banana



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-28 : 07:34:04
If you can put up with the slight possibility (albeit very very slight) that binary_checksums may be the same for different strings or that lightning might strike and two different sets of binary_checksums might add up to the same sum, the following may be a little bit faster:

CREATE TABLE #tmp (cola VARCHAR(32), colb VARCHAR(32))

INSERT INTO #tmp VALUES
('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana');

;WITH cte as
(
SELECT
cola,
SUM(BINARY_CHECKSUM(colb)) AS bcs
FROM
#tmp
GROUP BY cola
)
SELECT
a.cola
FROM
cte A
WHERE EXISTS
(
SELECT 1 FROM cte b
WHERE b.bcs = a.bcs
GROUP BY bcs HAVING COUNT(*) > 1
)
Go to Top of Page
   

- Advertisement -