Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
HaYen
Starting Member
6 Posts |
Posted - 2011-06-18 : 03:59:29
|
Here is the scenario, we have a Customer DB that has duplicate Customers with different CUSTID values. We plan on keeping the oldest CUSTOMER Record and merge any newer records into the Oldest one and cancel the newer records:tblDUPS:CUSTNAME CUSTID CUSTID2 MatchFieldRichard Smith 101 102 NAMEDick Smith 102 105 NAMERobert Bug 103 104 CITYDickie Smith 105 106 CITY The objective is relate all three rows to the smallest CUSTID and PIVOT the MatchField values:RESULT SET CUSTID1 CUSTID2 NAME DLN CITY101 102 Y N N101 105 N Y N101 106 N N Y103 104 N N Y In the example above, I have already identified the duplicate rows in the the Customer Table and populated table tblDUPS. I'm just having mental block on rendering the last result set. There is no number to the amount of duplicates that can be in the table. In the example above CUSTID's 101,102,105, and 106 all belong to the same customer. Eventually, we will roll all purchase history into the smallest (oldest) CUSTID's (101,103) and Cancel the high CUSTID's (102,104,105,106).Any idea's? |
|
|
HaYen
Starting Member
6 Posts |
Posted - 2011-06-21 : 02:54:55
|
| BBT |
 |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-06-21 : 11:09:48
|
| How do you work out that all of these:CUSTNAME CUSTID CUSTID2 MatchFieldRichard Smith 101 102 NAMEDick Smith 102 105 NAMERobert Bug 103 104 CITYDickie Smith 106 105 CITYAll belong to the same customer? There's nothing there that I can see indicates that.You'll need to tell us *how* to define which rows belong to a particular customer.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
|
HaYen
Starting Member
6 Posts |
Posted - 2011-06-21 : 16:38:21
|
| Yes Charlie you are correct. I didn't state how I knew they were dups just that they were dups. The core of the problem isn't identifying duplicate customer rows. The customer ID isn't even three characters long but anywhere from 4 to 9 characters in length. The problem is grouping the output to one customer id so a clerk can go through the process of merging the records in to one active account.How we got duplicates to begin with was registration data coming in from cards or website. Some customers would use a full name like Richard on the registration card and then register again with a shorten name like Dick. No one really cared too much about duplicates before I got here. Now I can identify 99.997% (based on visual checking my duplicate result set to the Customer table) of the duplicates in the system and the request from the clerks to show the results as per their requirements.If you're still interested in how I identified the dups the Customer table is compared to the Customer table. The Last Name, First Letter of the firstname, DOB, AND last for of the SSN is compared to the equivalent fields. If the last name is hyphenated (Mary Smith-Jones) or has a space in (Jose Del La Rosa) it parses out one of the last names and compares it to other customers with the same last name, first letter of the first name, dob, and last 4 of their SSN. So Mary Smith-Jones, 01/01/1986, 0211 would match to Mary Jones, 01/01/1986, 0211. Or Jose Del La Rosa 01/01/1986, 2011 would match to Juan Rosa, 01/01/1986, 0211. Each duplicate has to be verified and data merged by a clerk. In the example above Mary Jones is an exact match, but Jose Del La Rosa is not Juan Del La Rosa (well probably not and that's why it has to be manually verified)HaYen |
 |
|
|
HaYen
Starting Member
6 Posts |
Posted - 2011-06-22 : 15:01:21
|
The code below gets me what I want if there are two Customers that are duplicated. If a customer has 3, 4, or more duplicates ... well it still works but I have to wrap the UPDATE statement in a loop for a predetermined number of iterations. I tried until ROWCOUNT = 0 unsuccessfully. Idea's? /*** If the temporary tables used in this SP exists, drop them*/IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#CUSTDUPS') IS NOT NULL) DROP TABLE #CUSTDUPSIF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesPivot') IS NOT NULL) DROP TABLE #DupesPivotIF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesHierarchy') IS NOT NULL) DROP TABLE #DupesHierarchyCREATE TABLE #CUSTDUPS( [v1CustID] [int] NOT NULL, [v2CustID] [int] NOT NULL, [MatchField] [char](10) NOT NULL)CREATE TABLE #DupesPivot ( ID1 INT, ID2 INT, NAME CHAR(1), DLN CHAR(1), DIST CHAR(1))INSERT INTO #CUSTDUPS VALUES (101,102,'NAME'), (101,102,'DLN'), (101,102,'DIST'), (103,104,'NAME'), (102,105,'DLN')---SELECT * FROM #CUSTDUPS---UPDATE v1 SET v1.v1CustID = v2.v1CustIDFROM #CUSTDUPS AS v1 INNER JOIN #CUSTDUPS AS v2 ON v1.v1CustID = v2.v2CustID--- INSERT INTO #DupesPivotSELECT v1CUSTID, v2CUSTID, NAME = MAX(CASE MatchField WHEN 'NAME' THEN 'Y' ELSE 'N' END), DLN = MAX(CASE MatchField WHEN 'DLN' THEN 'Y' ELSE 'N' END), DIST = MAX(CASE MatchField WHEN 'DIST' THEN 'Y' ELSE 'N' END)FROM #CUSTDUPSGROUP BY v1CUSTID, v2CUSTID |
 |
|
|
|
|
|
|
|