Removing multi-column duplicates

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Removing multi-column duplicates

Author

Topic

billsox
Yak Posting Veteran

74 Posts

Posted - 2003-05-27 : 17:54:37

Hi -- I've got a table that has the following structure:


Col1     Col2
   X        A
   Y        B
   Z        C
   A        X
   B        Y
   C        Z
etc.

In my situation, I consider the first row to be the same as the
fourth; the second, the same as the fifth; and the third, the same
as the sixth. But how can I remove these multi-column duplicates?
I know there must be some way to do it with a self-join, but I
can't seem to find the correct method. Help anyone?

Bill

JCamburn
Starting Member

31 Posts

Posted - 2003-05-27 : 18:18:46

Are the values in the columns really single letters?

Is the primary key the combination of these two columns?

billsox
Yak Posting Veteran

74 Posts

Posted - 2003-05-27 : 18:23:41

No, I simplified the example to use single letters. In my situation both columns are VARCHAR(6). Columns 1 and 2 both make up the primary key.

Any ideas?

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-05-27 : 19:17:18

What about something like this:

delete from
billsox a
where
col2 > col1
AND
exists
(select 1 from billsox b where a.col1 = b.col2 and b.col1 = a.col2)

This says:

Only remove rows where col2 > col1, where there exists a row already in the table that is the opposite of the current row, remove it.

SHould work. test it out. as a SELECT first...

- Jeff

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2003-05-27 : 19:31:49

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=22882

JCamburn
Starting Member

31 Posts

Posted - 2003-05-27 : 19:32:09

I would use a WHILE loop. But if you need a single SQL statement, you could try the following.


DELETE 
    TableName
FROM
(
    SELECT DISTINCT
        Q3.ColA,
        Q3.ColB
    FROM
    (
        SELECT
            CASE
                WHEN Q1.ColA < Q1.ColB THEN Q1.ColA
                WHEN Q1.ColB < Q1.ColA THEN Q1.ColB
            END AS ColA,
            CASE
                WHEN Q1.ColA < Q1.ColB THEN Q1.ColB
                WHEN Q1.ColB < Q1.ColA THEN Q1.ColA
            END AS ColB
        FROM
            TableName Q1
            INNER JOIN
            TableName Q2
            ON Q1.ColA = Q2.ColB And Q1.ColB = Q2.ColA
        WHERE
            Q1.ColA <> Q1.ColB
    ) AS Q3
) AS Q4
WHERE
    TableName.ColA = Q4.ColA And TableName.ColB = Q4.ColB

billsox
Yak Posting Veteran

74 Posts

Posted - 2003-05-27 : 20:38:43

This looks like it'll work. I'll test it out. Thanks a million!

quote:

What about something like this:

delete from
billsox a
where
col2 > col1
AND
exists
(select 1 from billsox b where a.col1 = b.col2 and b.col1 = a.col2)

This says:

Only remove rows where col2 > col1, where there exists a row already in the table that is the opposite of the current row, remove it.

SHould work. test it out. as a SELECT first...

- Jeff

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources