| Author |
Topic |
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-13 : 09:42:37
|
Hey There. I'm in the process of doing a major data clean up and I'm just wondering how I would go about eliminating some redundant data.The Table LayoutContractsCNTRID CONTRACTNUM STARTDATE CUSTOMNUM=======================================================0 1234567 091885 A 1 1234567 091885 A 2 1111111 111111 B3 1234567 081205 A EquipmentEQUIPID DEVICENAME CNTRID CUSTOMNUM=======================================================0 DEVICE1 0 A1 DEVICE2 2 B2 DEVICE3 1 A3 DEVICE4 3 A You will notice that each customer may have multiple devices. Each device may be tied to a contract, and each contract may have one or more devices tied to it. In the example above, you will notice in the contracts table the contracts with the IDs 0 and 1.Fig 1.CNTRID CONTRACTNUM STARTDATE CUSTOMNUM=======================================================0 1234567 091885 A 1 1234567 091885 A These contracts have the exact same information. Furthermore, if you look down the table you will notice the contract with the ID 3.Fig 2.CNTRID CONTRACTNUM STARTDATE CUSTOMNUM=======================================================3 1234567 081205 A This contract shares the same contract and customer number, but has a different start date.Now lets take a look devices in the equipment table that refer to these records.EQUIPID DEVICENAME CNTRID CUSTOMNUM=======================================================0 DEVICE1 0 A2 DEVICE3 1 A3 DEVICE4 3 A You will notice that DEVICE1 and DEVICE 3 refer to the contract records that contain identical data. (As shown in 'Fig 1')My question is as follows:How do I eliminate the any duplicate records from the contracts table, and update the records in the equipment table with id of the left over contract.Results Should be as follows: ContractsCNTRID CONTRACTNUM STARTDATE CUSTOMNUM=======================================================0 1234567 091885 A 2 1111111 111111 B3 1234567 081205 A EquipmentEQUIPID DEVICENAME CNTRID CUSTOMNUM=======================================================0 DEVICE1 0 A1 DEVICE2 2 B2 DEVICE3 0 A3 DEVICE4 3 A Any help you may provide would be greatly appreciated!Thanks--mike |
|
|
cat_jesus
Aged Yak Warrior
547 Posts |
Posted - 2008-05-13 : 09:51:28
|
| This topic has been covered many times before. You can do a search on it, but I'll give you a quick answer. Do a distinct insert into a temp table of all your records, truncate your source table and then insert back to the source from the temp.Be sure to test it first.An infinite universe is the ultimate cartesian product. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-05-13 : 10:49:58
|
This will do update:-;With CTE(RowNo,CNTRID,CONTRACTNUM,STARTDATE,CUSTOMNUM) AS(SELECT ROW_NUMBER() OVER (PARTITION BY CONTRACTNUM,STARTDATE,CUSTOMNUM ORDER BY CNTRID) AS RowNo,*FROM Contracts)UPDATE eSET e.CNTRID=c2.CNTRIDFROM Equipment eINNER JOIN CTE c1ON c.CNTRID =e.CNTRID AND c.CUSTOMNUM=e.CUSTOMNUMINNER JOIN CTE c2ON c2.CONTRACTNUM=c1.CONTRACTNUMAND c2.STARTDATE=c1.STARTDATEAND c2.CUSTOMNUM=c1.CUSTOMNUMAND c2.RowNo=1AND c1.RowNo>1 and this will delete the duplicates:-DELETE tFROM(SELECT ROW_NUMBER() OVER (PARTITION BY CONTRACTNUM,STARTDATE,CUSTOMNUM ORDER BY CNTRID) AS RowNo,*FROM Contracts)tWHERE t.RowNo >1 |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-13 : 12:24:22
|
| Thanks a lot. Seem like it will work great. |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-13 : 14:42:34
|
| Hey visakh16, I implemented this query (filling in the fields that i left out of the example). Before i ran the query I did a select distinct of all the fields except the identity field. The result was 300+ records. However, after i ran the query above, I'm left with less then 200 records. Should I not have the same number of rows as the select distinct?mike |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-05-13 : 14:47:10
|
quote: Originally posted by mike.bull Hey visakh16, I implemented this query (filling in the fields that i left out of the example). Before i ran the query I did a select distinct of all the fields except the identity field. The result was 300+ records. However, after i ran the query above, I'm left with less then 200 records. Should I not have the same number of rows as the select distinct?mike
Cant say they will be same. Depends on what the values of filled in fields were. If some of those fields had distinct values the count can be different.Are you sure you wont get duplicate values for other columns for unique CONTRACTNUM,STARTDATE,CUSTOMNUM combination? |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-13 : 14:59:47
|
| Hey,I outputted the results before and after to a CSV.Before:A0006,2419653,,20040425,,NO,NO,,,NULL,NULLA0006,C00100,,20031001,,NO,NO,,,NULL,NULLA0203,3487684,NULL,,,8x5x4,,,,,A0203,3487684,,20070420,,,,,,NULL,NULLB0002,2002184,NULL,,,YES,,,,,After:A0006,2419653,,20040425,,NO,NO,,,NULL,NULLA0006,C00100,,20031001,,NO,NO,,,NULL,NULLA0203,3487684,,20070420,,,,,,NULL,NULLB0002,2002184,NULL,,,YES,,,,,The first 5 results show a missing record for A0203.Do you think null values messing up the inner join?mike |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-05-13 : 15:01:55
|
quote: Originally posted by mike.bull Hey,I outputted the results before and after to a CSV.Before:A0006,2419653,,20040425,,NO,NO,,,NULL,NULLA0006,C00100,,20031001,,NO,NO,,,NULL,NULLA0203,3487684,NULL,,,8x5x4,,,,,A0203,3487684,,20070420,,,,,,NULL,NULLB0002,2002184,NULL,,,YES,,,,,After:A0006,2419653,,20040425,,NO,NO,,,NULL,NULLA0006,C00100,,20031001,,NO,NO,,,NULL,NULLA0203,3487684,,20070420,,,,,,NULL,NULLB0002,2002184,NULL,,,YES,,,,,The first 5 results show a missing record for A0203.Do you think null values messing up the inner join?mike
With the limited info you provuide, i'm unable to tell. Can you just post the values with table columns names of these 5 records before and after? |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-13 : 15:42:19
|
| [code]BEFORE|CUSTOMNUM | CONTRACT | SMARTNET | SMARTNET_STOP_DATE | SMARTNET_START_DATE | ONSITE | PREMIUM_ONSITE | ONSITE_START_DATE | ONSITE_STOP_DATE | ONSITE_RENEW_DATE | NOTEA0006|2419653| |20040425| |NO |NO| | |NULL|NULLA0006|C00100 | |20031001| |NO |NO| | |NULL|NULLA0203|3487684|NULL| | |8x5x4| | | | |A0203|3487684| |20070420| | | | | |NULL|NULLB0002|2002184|NULL| | |YES | | | | |AFTER|CUSTOMNUM | CONTRACT | SMARTNET | SMARTNET_STOP_DATE | SMARTNET_START_DATE | ONSITE | PREMIUM_ONSITE | ONSITE_START_DATE | ONSITE_STOP_DATE | ONSITE_RENEW_DATE | NOTEA0006|2419653| |20040425| |NO |NO| | |NULL|NULLA0006|C00100 | |20031001| |NO |NO| | |NULL|NULLA0203|3487684| |20070420| | | | | |NULL|NULLB0002|2002184|NULL | | |YES| | | | |This is the query I ended up withWith CTE(RowNo, CUSTCNTR_ID, CUSTOMNUM, CONTRACT, SMARTNET, SMARTNET_START_DATE, SMARTNET_STOP_DATE, ONSITE, PREMIUM_ONSITE, ONSITE_START_DATE, ONSITE_STOP_DATE, ONSITE_RENEW_DATE, NOTE) AS(SELECT ROW_NUMBER() OVER (PARTITION BY CONTRACT, CUSTOMNUM, SMARTNET, SMARTNET_START_DATE, SMARTNET_STOP_DATE, ONSITE, PREMIUM_ONSITE, ONSITE_START_DATE, ONSITE_STOP_DATE, ONSITE_RENEW_DATE, NOTE ORDER BY CUSTCNTR_ID) AS RowNo, * FROM ooss.ontrack_contracts)UPDATE eSET e.CUSTCNTR_ID=c2.CUSTCNTR_IDFROM ooss.ontrack_customerequiptmp e INNER JOIN CTE c1 ON c1.CUSTCNTR_ID = e.CUSTCNTR_ID AND c1.CUSTOMNUM = e.CUSTOMNUM INNER JOIN CTE c2 ON c2.CONTRACT=c1.CONTRACT AND c2.SMARTNET_START_DATE=c1.SMARTNET_START_DATE AND c2.CUSTOMNUM=c1.CUSTOMNUM AND c2.SMARTNET_STOP_DATE = c1.SMARTNET_STOP_DATE AND c2.SMARTNET = c1.SMARTNET AND c2.ONSITE = c1.ONSITE AND c2.PREMIUM_ONSITE = c1.PREMIUM_ONSITE AND c2.ONSITE_START_DATE = c1.ONSITE_START_DATE AND ISNULL(c2.ONSITE_RENEW_DATE, '') = ISNULL(c1.ONSITE_RENEW_DATE, '') AND ISNULL(c2.NOTE, '') = ISNULL(c1.NOTE, '') AND c2.RowNo=1 AND c1.RowNo>1DELETE tFROM(SELECT ROW_NUMBER() OVER (PARTITION BY SMARTNET, SMARTNET_START_DATE, SMARTNET_STOP_DATE, ONSITE, PREMIUM_ONSITE, ONSITE_START_DATE, ONSITE_STOP_DATE, ONSITE_RENEW_DATE, NOTE ORDER BY CUSTCNTR_ID) AS RowNo, *FROM ooss.ontrack_contracts) tWHERE t.RowNo >1[/code] |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-14 : 08:54:21
|
| Well its fixed. Thanks for your help. It turns out, that in the partition statement I had customnum and contract reversed. I fixed this and I have the appropriate results.Thanks again for all your help!Mike |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-05-14 : 09:36:43
|
Ok?The order of tables in PARTITION BY does not matter. E 12°55'05.25"N 56°04'39.16" |
 |
|
|
mike.bull
Starting Member
9 Posts |
Posted - 2008-05-14 : 10:21:19
|
My apologizes, I just compared the new and the old queries once again, Peso is right the order of the columns in the Partition statement have no effect on the results. The issue lied in the delete statement at the bottom. DELETE tFROM(SELECT ROW_NUMBER() OVER (PARTITION BY SMARTNET, SMARTNET_START_DATE, SMARTNET_STOP_DATE, ONSITE, PREMIUM_ONSITE, ONSITE_START_DATE, ONSITE_STOP_DATE, ONSITE_RENEW_DATE, NOTE ORDER BY CUSTCNTR_ID) AS RowNo, *FROM ooss.ontrack_contracts) tWHERE t.RowNo >1 You will notice that I have not included the CUSTOMNUM and the CONTRACT fields in the partition statement here. Whoops.Anyway, sorry for misleading.Mike |
 |
|
|
|
|
|