Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Deduplicating a List

Author  Topic 

newnoise
Starting Member

3 Posts

Posted - 2013-06-24 : 19:20:00
I have a table with 3 columns: unique ID, spouse ID, and a market rating. I'm trying to figure out how to deduplicate the list so that only one spouse per married couple appears in the list. Here's an example:

ID SpouseID Rating
1 NULL High
2 5 Low
3 4 NULL
4 3 Low
5 2 NULL
6 NULL NULL
7 10 High
8 15 Low
9 22 NULL
10 7 NULL


I want to come up with code that will select only one member per couple included in the list but to privilege the spouse that has a value in the rating column. One possible fix I came up with (that doesn't work) is the following:

select distinct
case
when ID > spouseID then ID
when spouseID > ID then spouseID
else ID
end
from table


If you look at the original data, you'll see that the problem with this code is that it will end up keeping ID 5 and 10, even though those cases have a NULL value in the Rating column. If I were to switch the sign of the code from "greater than" (>) to "less than" (<), the code would keep ID 3, even though it has a NULL value in the Rating column.

My question, then: How do I get this list of 10 cases to return the following IDs: 1, 2, 4, 6, 7, 8, and 9?

SQL Server 2012

MuMu88
Aged Yak Warrior

549 Posts

Posted - 2013-06-24 : 20:49:36
Just add a where clause

select distinct
case
when ID > spouseID then ID
when spouseID > ID then spouseID
else ID
end
from table
WHERE Rating IS NOT NULL

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-06-25 : 01:20:08
[code]
SELECT t.ID,t.SpouseID,COALESCE(t.Rating,t1.Rating) AS Rating
FROM table t
OUTER APPLY (SELECT COUNT(*) AS Cnt
FROM table
WHERE ID = t.SpouseID
AND SpouseID = t.ID
AND Rating IS NOT NULL
)t1
WHERE COALESCE(Cnt,0)=0
[/code]


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

MuMu88
Aged Yak Warrior

549 Posts

Posted - 2013-06-25 : 13:24:04
Is this what you want:
[CODE]

; WITH CTE AS
(SELECT ID FROM @Temp where Rating is not NULL)
SELECT ID FROM @Temp T WHERE ID NOT IN(SELECT ID FROM CTE) and
COALESCE(SpouseID, 0) NOT IN (SELECT ID FROM CTE)
UNION SELECT ID from CTE;

[/CODE]
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-06-25 : 14:01:15
quote:
Originally posted by visakh16


SELECT t.ID,t.SpouseID,COALESCE(t.Rating,t1.Rating) AS Rating
FROM table t
OUTER APPLY (SELECT COUNT(*) AS Cnt
FROM table
WHERE ID = t.SpouseID
AND SpouseID = t.ID
AND Rating IS NOT NULL
)t1
WHERE COALESCE(Cnt,0)=0



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs


SELECT t.ID,t.SpouseID,COALESCE(t.Rating,t1.Rating) AS Rating
....
t1 does not have a rating column, and even if I add that to the inner query, the logic doesn't seem right.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-06-26 : 02:52:42
ah...sorry...couldnt test it then


declare @temp table
(
ID int,
SpouseID int,
Rating varchar(10)
)

insert @temp
values(1, NULL, 'High'),
(2, 5, 'Low'),
(3, 4, NULL),
(4, 3, 'Low'),
(5, 2, NULL),
(6, NULL, NULL),
(7, 10, 'High'),
(8, 15, 'Low'),
(9, 22, NULL),
(10, 7, NULL),
(22,9,NULL)

SELECT t.ID,t.SpouseID,t.Rating
FROM @temp t
OUTER APPLY (SELECT COUNT(CASE WHEN Rating IS NOT NULL THEN 1 END) AS RCnt,
COUNT(*) AS Cnt
FROM @temp
WHERE ID = t.SpouseID
AND SpouseID = t.ID
)t1
WHERE (Rating IS NOT NULL AND COALESCE(RCnt,0)=0)
OR (ID > COALESCE(SpouseID,-1) AND COALESCE(Cnt,0) = 1)



----------------------------------
ID SpouseID Rating
----------------------------------
1 NULL High
2 5 Low
4 3 Low
5 2 NULL
6 NULL NULL
7 10 High
8 15 Low
10 7 NULL
22 9 NULL



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-06-26 : 09:20:05
quote:
Originally posted by visakh16

ah...sorry...couldnt test it then


declare @temp table
(
ID int,
SpouseID int,
Rating varchar(10)
)

insert @temp
values(1, NULL, 'High'),
(2, 5, 'Low'),
(3, 4, NULL),
(4, 3, 'Low'),
(5, 2, NULL),
(6, NULL, NULL),
(7, 10, 'High'),
(8, 15, 'Low'),
(9, 22, NULL),
(10, 7, NULL),
(22,9,NULL)

SELECT t.ID,t.SpouseID,t.Rating
FROM @temp t
OUTER APPLY (SELECT COUNT(CASE WHEN Rating IS NOT NULL THEN 1 END) AS RCnt,
COUNT(*) AS Cnt
FROM @temp
WHERE ID = t.SpouseID
AND SpouseID = t.ID
)t1
WHERE (Rating IS NOT NULL AND COALESCE(RCnt,0)=0)
OR (ID > COALESCE(SpouseID,-1) AND COALESCE(Cnt,0) = 1)



----------------------------------
ID SpouseID Rating
----------------------------------
1 NULL High
2 5 Low
4 3 Low
5 2 NULL
6 NULL NULL
7 10 High
8 15 Low
10 7 NULL
22 9 NULL



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs


Something doesn't seem right in that solution - for example, the (5,2) pair is repeated (as 5,2 and 2,5).

BTW, newnoise, MuMu's solution produces the right results without any duplicates.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-06-26 : 09:34:16
quote:
Originally posted by James K

quote:
Originally posted by visakh16

ah...sorry...couldnt test it then


declare @temp table
(
ID int,
SpouseID int,
Rating varchar(10)
)

insert @temp
values(1, NULL, 'High'),
(2, 5, 'Low'),
(3, 4, NULL),
(4, 3, 'Low'),
(5, 2, NULL),
(6, NULL, NULL),
(7, 10, 'High'),
(8, 15, 'Low'),
(9, 22, NULL),
(10, 7, NULL),
(22,9,NULL)

SELECT t.ID,t.SpouseID,t.Rating
FROM @temp t
OUTER APPLY (SELECT COUNT(CASE WHEN Rating IS NOT NULL THEN 1 END) AS RCnt,
COUNT(*) AS Cnt
FROM @temp
WHERE ID = t.SpouseID
AND SpouseID = t.ID
)t1
WHERE (Rating IS NOT NULL AND COALESCE(RCnt,0)=0)
OR (ID > COALESCE(SpouseID,-1) AND COALESCE(Cnt,0) = 1)



----------------------------------
ID SpouseID Rating
----------------------------------
1 NULL High
2 5 Low
4 3 Low
5 2 NULL
6 NULL NULL
7 10 High
8 15 Low
10 7 NULL
22 9 NULL



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs


Something doesn't seem right in that solution - for example, the (5,2) pair is repeated (as 5,2 and 2,5).

BTW, newnoise, MuMu's solution produces the right results without any duplicates.



for the OP's sample data even this works

declare @temp table
(
ID int,
SpouseID int,
Rating varchar(10)
)

insert @temp
values(1, NULL, 'High'),
(2, 5, 'Low'),
(3, 4, NULL),
(4, 3, 'Low'),
(5, 2, NULL),
(6, NULL, NULL),
(7, 10, 'High'),
(8, 15, 'Low'),
(9, 22, NULL),
(10, 7, NULL)

SELECT t.ID,t.SpouseID,t.Rating
FROM @temp t
OUTER APPLY (SELECT COUNT(*) AS Cnt
FROM @temp
WHERE ID = t.SpouseID
AND SpouseID = t.ID
AND Rating IS NOT NULL
)t1
WHERE COALESCE(Cnt,0)=0


output
----------------------------------
ID SpouseID Rating
---------------------------------
1 NULL High
2 5 Low
4 3 Low
6 NULL NULL
7 10 High
8 15 Low
9 22 NULL






I was trying to include an additional possibility of duplicates with both NULLs

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-06-26 : 10:02:09
This solution works for all scenarios ie including multiple NULL records as well as multiple ratings choosing one with highest rating


declare @temp table
(
ID int,
SpouseID int,
Rating varchar(10)
)

insert @temp
values(1, NULL, 'High'),
(2, 5, 'Low'),
(3, 4, NULL),
(4, 3, 'Low'),
(5, 2, NULL),
(6, NULL, NULL),
(7, 10, 'High'),
(8, 15, 'Low'),
(9, 22, NULL),
(10, 7, NULL),
(11,13,'Low'),
(13,11,'High'),
(22, 9, NULL)

SELECT ID,SpouseID,Rating
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY CASE WHEN ID > SpouseID THEN ID ELSE SpouseID END,CASE WHEN ID > SpouseID THEN SpouseID ELSE ID END ORDER BY CASE Rating WHEN 'High' THEN 2 WHEN 'Low' THEN 1 ELSE 0 END DESC) AS Seq,*
FROM @temp
)t
WHERE Seq=1
ORDER BY ID


output
---------------------------------
ID SpouseID Rating
---------------------------------
1 NULL High
2 5 Low
4 3 Low
6 NULL NULL
7 10 High
8 15 Low
9 22 NULL
13 11 High




------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
   

- Advertisement -