Select Unique Combination w/ Same Data Columns

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Select Unique Combination w/ Same Data Columns

Author

Topic

loneblade
Starting Member

3 Posts

Posted - 2010-04-15 : 14:15:05

I have a table with 4 columns as a result from a fuzzy match. The first 2 columns contain the same set of data as the last 2. The column names are Category1, ID1, Category2, and ID2. The combination of category and id are need to identify uniqueness in the record that I will be joining with later.

Is there a way to to select only the unique combination where that if [Combination of Category and ID] 1 and [Combination of Category and ID] 2 shows up in the one row, the reverse will not appear in the later rows?

Hopefully this makes sense.

hanbingl
Aged Yak Warrior

652 Posts

Posted - 2010-04-15 : 18:04:03

[CODE]

DECLARE @T1 TABLE (CAT1 VARCHAR(1), ID1 INT, CAT2 VARCHAR(1), ID2 INT)
INSERT INTO @T1
SELECT 'A',1,'B',2
UNION ALL
SELECT 'B',2,'A',1
UNION ALL
SELECT 'C',2,'A',1
UNION ALL
SELECT 'D',2,'A',1

SELECT CAT1, ID1, CAT2, ID2 FROM @T1 Z
WHERE NOT EXISTS (
SELECT 1 FROM
(
SELECT A.*, RANK() OVER(ORDER BY A.CAT1) AS RNK FROM @T1 A
JOIN @T1 B ON
A.CAT1 = B.CAT2
AND
A.ID1 = B.ID2
AND
A.CAT2 = B.CAT1
AND
A.ID2 = B.ID1
)T
WHERE Z.CAT1 = T.CAT1 AND Z.ID1 = T.ID1 AND Z.CAT2 = T.CAT2 AND Z.ID2 = T.ID2
AND RNK > 1
)
[/CODE]

quote:
Originally posted by loneblade

I have a table with 4 columns as a result from a fuzzy match. The first 2 columns contain the same set of data as the last 2. The column names are Category1, ID1, Category2, and ID2. The combination of category and id are need to identify uniqueness in the record that I will be joining with later.

Is there a way to to select only the unique combination where that if [Combination of Category and ID] 1 and [Combination of Category and ID] 2 shows up in the one row, the reverse will not appear in the later rows?

Hopefully this makes sense.

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2010-04-20 : 12:50:37

This would make a lot more sense with sample data, clear specs and DDL (keys, constriatns, all that stuff you don't get in a vague narrative).

>> I have a table with 4 columns as a result from a fuzzy match. The first 2 columns contain the same set of data as the last 2. <<

Then the last two columns are redundant

>> The column names are Category1, ID1, Category2, and ID2. The combination of category and id are need to identify uniqueness in the record [sic] that I will be joining with later. <<

Can I change the data element names to follow ISO-11179 rules instead of having repeating groups implied by array subscripts? I am fond of First Normal Form.

>> Is there a way to to select only the unique combination where that if [Combination of Category and ID] 1 and [Combination of Category and ID] 2 shows up in the one row, the reverse will not appear in the later rows? <<

Since a table has no ordering, what does "later rows" mean to you? Now, in a sequential file "following records" would be correct. You also used the term "record" before.

>> Hopefully this makes sense.<<

Nope. After a few decades of reading bad SQL and specs, I think you might mean something like this:

For a set with two columns, we do not want to have the same combination (as opposed to permutation) of values. That is
{(salt, pepper), (pepper, salt)} are a redundancy.

In SQL we can prohibit this with DDL:

CREATE TABLE Condiments
(first_condiment_name CHAR(15) NOT NULL,
second_condiment_name CHAR(15) NOT NULL,
PRIMARY KEY (first_condiment_name, second_condiment_name),
CHECK (first_condiment_name < second_condiment_name)
);

Your four-column table sounds like an attempt to extend this model. Did I guess right?

A cute trick is to create a derived table with the UNION of the flip of the column and a GROUP BY:

SELECT DISTINCT X.c1, X.c2
FROM (SELECT first_condiment_name, second_condiment_name
FROM Condiments
UNION
SELECT second_condiment_name, first_condiment_name
FROM Condiments) AS X(c1, c2)
GROUP BY X.c1, X.c2
HAVING COUNT(*) >= 2;

--CELKO--
Joe Celko, SQL Guru

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources