Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Need Help With Recursive results

Author  Topic 

HaYen
Starting Member

6 Posts

Posted - 2011-06-18 : 03:59:29

Here is the scenario, we have a Customer DB that has duplicate Customers with different CUSTID values. We plan on keeping the oldest CUSTOMER Record and merge any newer records into the Oldest one and cancel the newer records:

tblDUPS:

CUSTNAME CUSTID CUSTID2 MatchField
Richard Smith 101 102 NAME
Dick Smith 102 105 NAME
Robert Bug 103 104 CITY
Dickie Smith 105 106 CITY


The objective is relate all three rows to the smallest CUSTID and PIVOT the MatchField values:


RESULT SET
CUSTID1 CUSTID2 NAME DLN CITY
101 102 Y N N
101 105 N Y N
101 106 N N Y
103 104 N N Y


In the example above, I have already identified the duplicate rows in the the Customer Table and populated table tblDUPS. I'm just having mental block on rendering the last result set. There is no number to the amount of duplicates that can be in the table. In the example above CUSTID's 101,102,105, and 106 all belong to the same customer. Eventually, we will roll all purchase history into the smallest (oldest) CUSTID's (101,103) and Cancel the high CUSTID's (102,104,105,106).

Any idea's?

HaYen
Starting Member

6 Posts

Posted - 2011-06-21 : 02:54:55
BBT
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2011-06-21 : 11:09:48
How do you work out that all of these:
CUSTNAME CUSTID CUSTID2 MatchField
Richard Smith 101 102 NAME
Dick Smith 102 105 NAME
Robert Bug 103 104 CITY
Dickie Smith 106 105 CITY

All belong to the same customer? There's nothing there that I can see indicates that.

You'll need to tell us *how* to define which rows belong to a particular customer.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

HaYen
Starting Member

6 Posts

Posted - 2011-06-21 : 16:38:21
Yes Charlie you are correct. I didn't state how I knew they were dups just that they were dups. The core of the problem isn't identifying duplicate customer rows. The customer ID isn't even three characters long but anywhere from 4 to 9 characters in length. The problem is grouping the output to one customer id so a clerk can go through the process of merging the records in to one active account.

How we got duplicates to begin with was registration data coming in from cards or website. Some customers would use a full name like Richard on the registration card and then register again with a shorten name like Dick. No one really cared too much about duplicates before I got here. Now I can identify 99.997% (based on visual checking my duplicate result set to the Customer table) of the duplicates in the system and the request from the clerks to show the results as per their requirements.

If you're still interested in how I identified the dups the Customer table is compared to the Customer table. The Last Name, First Letter of the firstname, DOB, AND last for of the SSN is compared to the equivalent fields. If the last name is hyphenated (Mary Smith-Jones) or has a space in (Jose Del La Rosa) it parses out one of the last names and compares it to other customers with the same last name, first letter of the first name, dob, and last 4 of their SSN.

So Mary Smith-Jones, 01/01/1986, 0211 would match to Mary Jones, 01/01/1986, 0211. Or Jose Del La Rosa 01/01/1986, 2011 would match to Juan Rosa, 01/01/1986, 0211. Each duplicate has to be verified and data merged by a clerk. In the example above Mary Jones is an exact match, but Jose Del La Rosa is not Juan Del La Rosa (well probably not and that's why it has to be manually verified)

HaYen
Go to Top of Page

HaYen
Starting Member

6 Posts

Posted - 2011-06-22 : 15:01:21
The code below gets me what I want if there are two Customers that are duplicated. If a customer has 3, 4, or more duplicates ... well it still works but I have to wrap the UPDATE statement in a loop for a predetermined number of iterations. I tried until ROWCOUNT = 0 unsuccessfully.

Idea's?





/*
** If the temporary tables used in this SP exists, drop them
*/
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#CUSTDUPS') IS NOT NULL) DROP TABLE #CUSTDUPS
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesPivot') IS NOT NULL) DROP TABLE #DupesPivot
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesHierarchy') IS NOT NULL) DROP TABLE #DupesHierarchy

CREATE TABLE #CUSTDUPS(
[v1CustID] [int] NOT NULL,
[v2CustID] [int] NOT NULL,
[MatchField] [char](10) NOT NULL
)

CREATE TABLE #DupesPivot (
ID1 INT,
ID2 INT,
NAME CHAR(1),
DLN CHAR(1),
DIST CHAR(1)
)

INSERT INTO #CUSTDUPS VALUES
(101,102,'NAME'),
(101,102,'DLN'),
(101,102,'DIST'),
(103,104,'NAME'),
(102,105,'DLN')

---

SELECT * FROM #CUSTDUPS

---

UPDATE
v1
SET
v1.v1CustID = v2.v1CustID
FROM
#CUSTDUPS AS v1 INNER JOIN #CUSTDUPS AS v2 ON v1.v1CustID = v2.v2CustID

---

INSERT INTO #DupesPivot
SELECT
v1CUSTID,
v2CUSTID,
NAME = MAX(CASE MatchField WHEN 'NAME' THEN 'Y' ELSE 'N' END),
DLN = MAX(CASE MatchField WHEN 'DLN' THEN 'Y' ELSE 'N' END),
DIST = MAX(CASE MatchField WHEN 'DIST' THEN 'Y' ELSE 'N' END)
FROM
#CUSTDUPS
GROUP BY
v1CUSTID,
v2CUSTID
Go to Top of Page
   

- Advertisement -