Need Help With Recursive results

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Need Help With Recursive results

Author

Topic

HaYen
Starting Member

6 Posts

Posted - 2011-06-18 : 03:59:29

Here is the scenario, we have a Customer DB that has duplicate Customers with different CUSTID values. We plan on keeping the oldest CUSTOMER Record and merge any newer records into the Oldest one and cancel the newer records:

tblDUPS:


CUSTNAME      CUSTID CUSTID2 MatchField
Richard Smith   101    102     NAME
Dick Smith      102    105     NAME
Robert Bug      103    104     CITY
Dickie Smith    105    106     CITY

The objective is relate all three rows to the smallest CUSTID and PIVOT the MatchField values:


RESULT SET                   
CUSTID1  CUSTID2   NAME   DLN   CITY
101       102       Y      N      N
101       105       N      Y      N
101       106       N      N      Y
103       104       N      N      Y

In the example above, I have already identified the duplicate rows in the the Customer Table and populated table tblDUPS. I'm just having mental block on rendering the last result set. There is no number to the amount of duplicates that can be in the table. In the example above CUSTID's 101,102,105, and 106 all belong to the same customer. Eventually, we will roll all purchase history into the smallest (oldest) CUSTID's (101,103) and Cancel the high CUSTID's (102,104,105,106).

Any idea's?

HaYen
Starting Member

6 Posts

Posted - 2011-06-21 : 02:54:55

BBT

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2011-06-21 : 11:09:48

How do you work out that all of these:
CUSTNAME CUSTID CUSTID2 MatchField
Richard Smith 101 102 NAME
Dick Smith 102 105 NAME
Robert Bug 103 104 CITY
Dickie Smith 106 105 CITY

All belong to the same customer? There's nothing there that I can see indicates that.

You'll need to tell us *how* to define which rows belong to a particular customer.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

HaYen
Starting Member

6 Posts

Posted - 2011-06-21 : 16:38:21

Yes Charlie you are correct. I didn't state how I knew they were dups just that they were dups. The core of the problem isn't identifying duplicate customer rows. The customer ID isn't even three characters long but anywhere from 4 to 9 characters in length. The problem is grouping the output to one customer id so a clerk can go through the process of merging the records in to one active account.

How we got duplicates to begin with was registration data coming in from cards or website. Some customers would use a full name like Richard on the registration card and then register again with a shorten name like Dick. No one really cared too much about duplicates before I got here. Now I can identify 99.997% (based on visual checking my duplicate result set to the Customer table) of the duplicates in the system and the request from the clerks to show the results as per their requirements.

If you're still interested in how I identified the dups the Customer table is compared to the Customer table. The Last Name, First Letter of the firstname, DOB, AND last for of the SSN is compared to the equivalent fields. If the last name is hyphenated (Mary Smith-Jones) or has a space in (Jose Del La Rosa) it parses out one of the last names and compares it to other customers with the same last name, first letter of the first name, dob, and last 4 of their SSN.

So Mary Smith-Jones, 01/01/1986, 0211 would match to Mary Jones, 01/01/1986, 0211. Or Jose Del La Rosa 01/01/1986, 2011 would match to Juan Rosa, 01/01/1986, 0211. Each duplicate has to be verified and data merged by a clerk. In the example above Mary Jones is an exact match, but Jose Del La Rosa is not Juan Del La Rosa (well probably not and that's why it has to be manually verified)

HaYen

HaYen
Starting Member

6 Posts

Posted - 2011-06-22 : 15:01:21

The code below gets me what I want if there are two Customers that are duplicated. If a customer has 3, 4, or more duplicates ... well it still works but I have to wrap the UPDATE statement in a loop for a predetermined number of iterations. I tried until ROWCOUNT = 0 unsuccessfully.

Idea's?


/*
**  If the temporary tables used in this SP exists, drop them
*/
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#CUSTDUPS') IS NOT NULL) DROP TABLE #CUSTDUPS
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesPivot') IS NOT NULL) DROP TABLE #DupesPivot
IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE OBJECT_ID('tempdb..#DupesHierarchy') IS NOT NULL) DROP TABLE #DupesHierarchy

CREATE TABLE #CUSTDUPS(
      [v1CustID] [int] NOT NULL,
      [v2CustID] [int] NOT NULL,
      [MatchField] [char](10) NOT NULL
)

CREATE TABLE #DupesPivot (
      ID1 INT,
      ID2 INT,
      NAME CHAR(1),
      DLN CHAR(1),
      DIST CHAR(1)
)

INSERT INTO #CUSTDUPS VALUES
      (101,102,'NAME'),
      (101,102,'DLN'),
      (101,102,'DIST'),
      (103,104,'NAME'),
      (102,105,'DLN')

---

SELECT * FROM #CUSTDUPS

---

UPDATE
      v1 
SET
      v1.v1CustID = v2.v1CustID
FROM 
      #CUSTDUPS AS v1 INNER JOIN #CUSTDUPS AS v2 ON v1.v1CustID = v2.v2CustID

--- 

INSERT INTO #DupesPivot
SELECT
      v1CUSTID,
      v2CUSTID,
      NAME = MAX(CASE MatchField WHEN 'NAME' THEN 'Y' ELSE 'N' END),
      DLN  = MAX(CASE MatchField WHEN 'DLN'  THEN 'Y' ELSE 'N' END),
      DIST = MAX(CASE MatchField WHEN 'DIST' THEN 'Y' ELSE 'N' END)
FROM
      #CUSTDUPS
GROUP BY
      v1CUSTID,
      v2CUSTID

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources