Group by column A but compare column B

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Group by column A but compare column B

Author

Topic

GarDebs2010
Starting Member

4 Posts

Posted - 2012-09-27 : 15:25:10

This one has had me stumped for the last few hours and at this stage I think I need some help...

I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana
I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-27 : 15:43:47

Probably not the most elegant or most efficient, (and probably a little silly) but here is one way:

CREATE TABLE #tmp (cola VARCHAR(32), colb VARCHAR(32))

INSERT INTO #tmp VALUES
('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana');

;WITH cte AS
(
	SELECT
		a.cola,
		b.colbs
	FROM
		(SELECT DISTINCT cola FROM #tmp) a
		CROSS APPLY
		(
			SELECT colb AS [text()] FROM #tmp c
			WHERE c.cola = a.cola
			ORDER BY c.colb
			FOR XML PATH('')
		) b(colbs)
)
SELECT
	a.cola,
	b.cola
FROM
	cte a INNER JOIN cte b ON a.colbs = b.colbs AND a.cola <> b.cola
		AND a.cola < b.cola;

DROP TABLE #tmp;

bitsmed
Aged Yak Warrior

545 Posts

Posted - 2012-09-27 : 16:34:49

Alternative to sunitabeck's solution


select a.col_a
  from tablename as a
       left outer join tablename as b
                    on b.col_b=a.col_b
                   and b.col_a<>a.col_a
 group by a.col_a
 having sum(case when b.col_b is null then 1 else 0 end)=0

Edit: my above sql attempt, did only work with the supplied test data, so I have rewritten the sql to fit all scenarios (I hope)


select col_a
  from (select col_a
              ,count(col_b) col_b_count
          from tablename
         group by col_a
       ) as a
       inner join (select a.col_a as col_a1
                         ,b.col_a as col_a2
                         ,count(b.col_b) col_b_count
                     from tablename as a
                          inner join tablename as b
                                  on b.col_a<>a.col_a
                                 and b.col_b=a.col_b
                    group by a.col_a
                            ,b.col_a
                  ) as b
               on b.col_a1=a.col_a
 where a.col_b_count=b.col_b_count

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-09-27 : 17:37:45

quote:
Originally posted by GarDebs2010

This one has had me stumped for the last few hours and at this stage I think I need some help...

I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana
I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G


declare @test table
(
ColA varchar(100),
ColB varchar(100)
)
insert @test
values('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana')


;With Temp
AS
(
SELECT ColA,
ColB,
STUFF ((SELECT ','+ ColB FROM @test WHERE ColA = t.ColA ORDER BY ColB FOR XML PATH('')),1,1,'') AS ColBList
FROM @test t
)

SELECT t.ColA,t.ColB
FROM Temp t
INNER JOIN (SELECT ColBList 
            FROM Temp
			GROUP BY ColBList
			HAVING COUNT(DISTINCT ColA) > 1
			)t1

ON t1.ColBList = t.ColBList



output
----------------------------
ColA	ColB
----------------------------
John	Apple
John	Orange
John	Banana
David	Apple
David	Orange
David	Banana

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-28 : 07:34:04

If you can put up with the slight possibility (albeit very very slight) that binary_checksums may be the same for different strings or that lightning might strike and two different sets of binary_checksums might add up to the same sum, the following may be a little bit faster:


CREATE TABLE #tmp (cola VARCHAR(32), colb VARCHAR(32))

INSERT INTO #tmp VALUES
('John','Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('David','Apple'),
('David','Orange'),
('David','Banana');

;WITH cte as 
(
	SELECT
		cola,
		SUM(BINARY_CHECKSUM(colb)) AS bcs
	FROM
		#tmp
	GROUP BY cola
)
SELECT
	a.cola
FROM
	cte A
WHERE EXISTS
(
	SELECT 1 FROM cte b 
	WHERE b.bcs = a.bcs 
	GROUP BY bcs HAVING COUNT(*) > 1
)

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources