Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 deleting/removing duplicates over multiple columns

Author  Topic 

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 12:39:45


I'm trying to remove duplicate entries over an entire row that spans multiple columns...

In my table I have the following columns

shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, time, ratetype, suiteprice, id_num

Each row represents a trip, with the above details.

I have a bunch of duplicates entries, I know how to deal with the problem when only one column is duplicates, but here the entire row is a duplicate entry. (except for the id_num and time column, which may actually be unique)

I used the following code to find all dupes

SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1, carnivalpricing as t2
where t1.shipname = t2.shipname and t1.saildate = t2.saildate and t1.itinerary = t2.itinerary
and t1.interiorprice = t2.interiorprice and t1.oceanviewprice = t2.oceanviewprice
and t1.balconyprice = t2.balconyprice
and t1.ratetype = t2.ratetype and t1.suiteprice = t2.suiteprice
order by saildate, interiorprice


and it seemed to pull up 13,000 rows of duplicates.

So i thought if I just replaced the = with <> like so...

SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.executiontime, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1, carnivalpricing as t2
where t1.shipname <> t2.shipname and t1.saildate <> t2.saildate and t1.itinerary <> t2.itinerary
and t1.interiorprice <> t2.interiorprice and t1.oceanviewprice <> t2.oceanviewprice
and t1.balconyprice <> t2.balconyprice
and t1.ratetype <> t2.ratetype and t1.suiteprice <> t2.suiteprice
order by saildate, interiorprice


I would select only duplicate entries save time and id_num column, and then insert into a new table,

but this didnt work, the result for this query was more duplicates.


Anyone have any ideas?

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 12:51:55
This will give you duplicate entries.
SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname = t1.shipname
AND t2.saildate = t1.saildate
AND t2.itinerary =t1.itinerary
AND t2.interiorprice =t1.interiorprice
AND t2.oceanviewprice=t1.oceanviewprice
AND t2.balconyprice=t1.balconyprice
AND t2.detailslink=t1.detailslink
AND t2.ratetype=t1.ratetype
AND t2.suiteprice=t1.suiteprice
AND t2.MinID=t1.id_num
WHERE t2.MinID IS NULL
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 12:57:11
quote:
Originally posted by visakh16

This will give you duplicate entries.
SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname = t1.shipname
AND t2.saildate = t1.saildate
AND t2.itinerary =t1.itinerary
AND t2.interiorprice =t1.interiorprice
AND t2.oceanviewprice=t1.oceanviewprice
AND t2.balconyprice=t1.balconyprice
AND t2.detailslink=t1.detailslink
AND t2.ratetype=t1.ratetype
AND t2.suiteprice=t1.suiteprice
AND t2.MinID=t1.id_num
WHERE t2.MinID IS NULL




then theoretically I can find uniques rows if I did...



[SELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname <> t1.shipname
AND t2.saildate <> t1.saildate
AND t2.itinerary <>t1.itinerary
AND t2.interiorprice <>t1.interiorprice
AND t2.oceanviewprice<>t1.oceanviewprice
AND t2.balconyprice<>t1.balconyprice
AND t2.detailslink<>t1.detailslink
AND t2.ratetype<>t1.ratetype
AND t2.suiteprice<>t1.suiteprice
AND t2.MinID<>t1.id_num
WHERE t2.MinID IS NULL




Right?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 13:04:09
Nope for that you need just a small change


SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname = t1.shipname
AND t2.saildate = t1.saildate
AND t2.itinerary =t1.itinerary
AND t2.interiorprice =t1.interiorprice
AND t2.oceanviewprice=t1.oceanviewprice
AND t2.balconyprice=t1.balconyprice
AND t2.detailslink=t1.detailslink
AND t2.ratetype=t1.ratetype
AND t2.suiteprice=t1.suiteprice
AND t2.MinID=t1.id_num
WHERE t2.MinID IS NULL
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 13:52:22
quote:
Originally posted by visakh16

Nope for that you need just a small change


SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname = t1.shipname
AND t2.saildate = t1.saildate
AND t2.itinerary =t1.itinerary
AND t2.interiorprice =t1.interiorprice
AND t2.oceanviewprice=t1.oceanviewprice
AND t2.balconyprice=t1.balconyprice
AND t2.detailslink=t1.detailslink
AND t2.ratetype=t1.ratetype
AND t2.suiteprice=t1.suiteprice
AND t2.MinID=t1.id_num
WHERE t2.MinID IS NULL




executed with no results, I also tried my idea and came back with results but couldnt confirm it was duplicate free because everything was so spread out. Running mine with an 'order by' to double check but its taking forever to execute.
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 13:53:26
quote:
Originally posted by sqlchiq

[quote]Originally posted by visakh16

Nope for that you need just a small change


SELECT t1.shipname,  t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,
t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,
t1.id_num
FROM carnivalpricing as t1
LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricing
GROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,
balconyprice, detailslink, ratetype, suiteprice)t2
ON t2.shipname = t1.shipname
AND t2.saildate = t1.saildate
AND t2.itinerary =t1.itinerary
AND t2.interiorprice =t1.interiorprice
AND t2.oceanviewprice=t1.oceanviewprice
AND t2.balconyprice=t1.balconyprice
AND t2.detailslink=t1.detailslink
AND t2.ratetype=t1.ratetype
AND t2.suiteprice=t1.suiteprice
AND t2.MinID=t1.id_num
WHERE t2.MinID IS NULL




executed with no results, I also tried my idea with the <>s and got some results but I couldnt confirm it was duplicate free because everything was so spread out. Running mine with an 'order by' to double check but its taking forever to execute.

Anymore ideas?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 14:03:41
Are you sure your id_num field has unique value?
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 14:16:42
quote:
Originally posted by visakh16

Are you sure your id_num field has unique value?



for some rows id_num is also duplicated in the entire entry, for others it is not. its a little weird
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 14:28:02
quote:
Originally posted by sqlchiq

quote:
Originally posted by visakh16

Are you sure your id_num field has unique value?



for some rows id_num is also duplicated in the entire entry, for others it is not. its a little weird


dont you have any column in table that has only unique values?
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 14:34:45
actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 14:38:57
quote:
Originally posted by sqlchiq

actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.


then my earlier query must work
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 15:06:12
quote:
Originally posted by visakh16

quote:
Originally posted by sqlchiq

actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.


then my earlier query must work



There are NULL values for the Ratetype and Suiteprice columns, do you think that may have something to do with it ?
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 15:18:17
quote:
Originally posted by sqlchiq

quote:
Originally posted by visakh16

quote:
Originally posted by sqlchiq

actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.


then my earlier query must work



There are NULL values for the Ratetype and Suiteprice columns, do you think that may have something to do with it ?


I think I found the problem, every value in the detailslink column had "NULL" So during the this part..

t2.detailslink=t1.detailslink, the query found this match and made the where clause invalid for every entry. resulting 0 entries.

is that correct?


thank you so much for your help vishakh!
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-11 : 17:04:13
Another problem has risen,

in the time column, some entries are like so....


ShipName | SailDate | Itinerary | InteriorPrice | OceanViewPrice | BalconyPrice | DetailsLink | Time | id_num | RateType | SuitePrice
VI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 7/24/2008 9:00:02 AM | 66298 | NULL | NULL
VI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 7/24/2008 9:30:10 AM | 67960 | NULL | NULL




because of the difference of a few minutes in the time column, sql thinks its two different entries when really its a duplicate.

I tried to solve it by putting convert(varchar, convert(datetime, t1.time),101) in the select statement but it just spits out




shipname | saildate | itinerary | interiorprice | oceanviewprice | balconyprice | detailslink | executiontime | ratetype | suiteprice | id_num
VI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 07/24/2008 | NULL | NULL | 66096
VI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 07/24/2008 | NULL | NULL | 67960


when it should only show one copy of that
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-11 : 23:34:54
Not sure how you got both the records? we're always looking for minimun id records so we should only get record with id_num 66298 out of two on top.
Go to Top of Page

sqlchiq
Posting Yak Master

133 Posts

Posted - 2008-08-12 : 20:31:45
quote:
Originally posted by visakh16

Not sure how you got both the records? we're always looking for minimun id records so we should only get record with id_num 66298 out of two on top.



I ended up converting the dates first, then inserting into a new table, and then running the code. not the most elegant thing but it works.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-13 : 00:36:19
converting the dates? to what?
Go to Top of Page
   

- Advertisement -