Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 12:39:45
|
I'm trying to remove duplicate entries over an entire row that spans multiple columns...In my table I have the following columnsshipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, time, ratetype, suiteprice, id_numEach row represents a trip, with the above details.I have a bunch of duplicates entries, I know how to deal with the problem when only one column is duplicates, but here the entire row is a duplicate entry. (except for the id_num and time column, which may actually be unique)I used the following code to find all dupesSELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1, carnivalpricing as t2where t1.shipname = t2.shipname and t1.saildate = t2.saildate and t1.itinerary = t2.itineraryand t1.interiorprice = t2.interiorprice and t1.oceanviewprice = t2.oceanviewpriceand t1.balconyprice = t2.balconypriceand t1.ratetype = t2.ratetype and t1.suiteprice = t2.suitepriceorder by saildate, interiorprice and it seemed to pull up 13,000 rows of duplicates.So i thought if I just replaced the = with <> like so...SELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.executiontime, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1, carnivalpricing as t2where t1.shipname <> t2.shipname and t1.saildate <> t2.saildate and t1.itinerary <> t2.itineraryand t1.interiorprice <> t2.interiorprice and t1.oceanviewprice <> t2.oceanviewpriceand t1.balconyprice <> t2.balconypriceand t1.ratetype <> t2.ratetype and t1.suiteprice <> t2.suitepriceorder by saildate, interiorprice I would select only duplicate entries save time and id_num column, and then insert into a new table,but this didnt work, the result for this query was more duplicates.Anyone have any ideas? |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 12:51:55
|
This will give you duplicate entries.SELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname = t1.shipnameAND t2.saildate = t1.saildateAND t2.itinerary =t1.itineraryAND t2.interiorprice =t1.interiorpriceAND t2.oceanviewprice=t1.oceanviewpriceAND t2.balconyprice=t1.balconypriceAND t2.detailslink=t1.detailslinkAND t2.ratetype=t1.ratetypeAND t2.suiteprice=t1.suitepriceAND t2.MinID=t1.id_numWHERE t2.MinID IS NULL |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 12:57:11
|
quote: Originally posted by visakh16 This will give you duplicate entries.SELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname = t1.shipnameAND t2.saildate = t1.saildateAND t2.itinerary =t1.itineraryAND t2.interiorprice =t1.interiorpriceAND t2.oceanviewprice=t1.oceanviewpriceAND t2.balconyprice=t1.balconypriceAND t2.detailslink=t1.detailslinkAND t2.ratetype=t1.ratetypeAND t2.suiteprice=t1.suitepriceAND t2.MinID=t1.id_numWHERE t2.MinID IS NULL
then theoretically I can find uniques rows if I did...[SELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFT JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname <> t1.shipnameAND t2.saildate <> t1.saildateAND t2.itinerary <>t1.itineraryAND t2.interiorprice <>t1.interiorpriceAND t2.oceanviewprice<>t1.oceanviewpriceAND t2.balconyprice<>t1.balconypriceAND t2.detailslink<>t1.detailslinkAND t2.ratetype<>t1.ratetypeAND t2.suiteprice<>t1.suitepriceAND t2.MinID<>t1.id_numWHERE t2.MinID IS NULL Right? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 13:04:09
|
Nope for that you need just a small changeSELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname = t1.shipnameAND t2.saildate = t1.saildateAND t2.itinerary =t1.itineraryAND t2.interiorprice =t1.interiorpriceAND t2.oceanviewprice=t1.oceanviewpriceAND t2.balconyprice=t1.balconypriceAND t2.detailslink=t1.detailslinkAND t2.ratetype=t1.ratetypeAND t2.suiteprice=t1.suitepriceAND t2.MinID=t1.id_numWHERE t2.MinID IS NULL |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 13:52:22
|
quote: Originally posted by visakh16 Nope for that you need just a small changeSELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname = t1.shipnameAND t2.saildate = t1.saildateAND t2.itinerary =t1.itineraryAND t2.interiorprice =t1.interiorpriceAND t2.oceanviewprice=t1.oceanviewpriceAND t2.balconyprice=t1.balconypriceAND t2.detailslink=t1.detailslinkAND t2.ratetype=t1.ratetypeAND t2.suiteprice=t1.suitepriceAND t2.MinID=t1.id_numWHERE t2.MinID IS NULL
executed with no results, I also tried my idea and came back with results but couldnt confirm it was duplicate free because everything was so spread out. Running mine with an 'order by' to double check but its taking forever to execute. |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 13:53:26
|
quote: Originally posted by sqlchiq [quote]Originally posted by visakh16 Nope for that you need just a small changeSELECT t1.shipname, t1.saildate, t1.itinerary, t1.interiorprice, t1.oceanviewprice,t1.balconyprice, t1.detailslink, t1.time, t1.ratetype, t1.suiteprice,t1.id_numFROM carnivalpricing as t1LEFTINNER JOIN (SELECT shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice,MIN(id_num) AS MinID FROM carnivalpricingGROUP BY shipname, saildate, itinerary, interiorprice, oceanviewprice,balconyprice, detailslink, ratetype, suiteprice)t2ON t2.shipname = t1.shipnameAND t2.saildate = t1.saildateAND t2.itinerary =t1.itineraryAND t2.interiorprice =t1.interiorpriceAND t2.oceanviewprice=t1.oceanviewpriceAND t2.balconyprice=t1.balconypriceAND t2.detailslink=t1.detailslinkAND t2.ratetype=t1.ratetypeAND t2.suiteprice=t1.suitepriceAND t2.MinID=t1.id_numWHERE t2.MinID IS NULL
executed with no results, I also tried my idea with the <>s and got some results but I couldnt confirm it was duplicate free because everything was so spread out. Running mine with an 'order by' to double check but its taking forever to execute.Anymore ideas? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 14:03:41
|
| Are you sure your id_num field has unique value? |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 14:16:42
|
quote: Originally posted by visakh16 Are you sure your id_num field has unique value?
for some rows id_num is also duplicated in the entire entry, for others it is not. its a little weird |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 14:28:02
|
quote: Originally posted by sqlchiq
quote: Originally posted by visakh16 Are you sure your id_num field has unique value?
for some rows id_num is also duplicated in the entire entry, for others it is not. its a little weird
dont you have any column in table that has only unique values? |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 14:34:45
|
| actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 14:38:57
|
quote: Originally posted by sqlchiq actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.
then my earlier query must work |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 15:06:12
|
quote: Originally posted by visakh16
quote: Originally posted by sqlchiq actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.
then my earlier query must work
There are NULL values for the Ratetype and Suiteprice columns, do you think that may have something to do with it ? |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 15:18:17
|
quote: Originally posted by sqlchiq
quote: Originally posted by visakh16
quote: Originally posted by sqlchiq actually scratch what I said, You're right. I did an order by on id_num and every entry for this column is unique.
then my earlier query must work
There are NULL values for the Ratetype and Suiteprice columns, do you think that may have something to do with it ?
I think I found the problem, every value in the detailslink column had "NULL" So during the this part..t2.detailslink=t1.detailslink, the query found this match and made the where clause invalid for every entry. resulting 0 entries.is that correct?thank you so much for your help vishakh! |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-11 : 17:04:13
|
| Another problem has risen,in the time column, some entries are like so....ShipName | SailDate | Itinerary | InteriorPrice | OceanViewPrice | BalconyPrice | DetailsLink | Time | id_num | RateType | SuitePriceVI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 7/24/2008 9:00:02 AM | 66298 | NULL | NULLVI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 7/24/2008 9:30:10 AM | 67960 | NULL | NULLbecause of the difference of a few minutes in the time column, sql thinks its two different entries when really its a duplicate.I tried to solve it by putting convert(varchar, convert(datetime, t1.time),101) in the select statement but it just spits outshipname | saildate | itinerary | interiorprice | oceanviewprice | balconyprice | detailslink | executiontime | ratetype | suiteprice | id_numVI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 07/24/2008 | NULL | NULL | 66096VI | Sep 9, 2009 | 7 Day SOUTHERN CARIBBEAN from $593 * | 593 | 673 | $863* | NULL | 07/24/2008 | NULL | NULL | 67960when it should only show one copy of that |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-11 : 23:34:54
|
| Not sure how you got both the records? we're always looking for minimun id records so we should only get record with id_num 66298 out of two on top. |
 |
|
|
sqlchiq
Posting Yak Master
133 Posts |
Posted - 2008-08-12 : 20:31:45
|
quote: Originally posted by visakh16 Not sure how you got both the records? we're always looking for minimun id records so we should only get record with id_num 66298 out of two on top.
I ended up converting the dates first, then inserting into a new table, and then running the code. not the most elegant thing but it works. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-13 : 00:36:19
|
| converting the dates? to what? |
 |
|
|
|
|
|
|
|