Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 De-Duplication Problem

Author  Topic 

paul.rowling
Yak Posting Veteran

81 Posts

Posted - 2003-01-30 : 15:57:21
Hi,

I have two tables in question a client table and a transactions table. The
two tables are related via a toro_clientid field. When a client is added to
the client table they are issued with a unique toro_clientid and any
transactions that they make are recorded in the transactions table against
their toro_clientid.
I have some software which I'm using to de-duplicate the clients in the
clients table. When the software finds a duplicate pair it assigns the two
records a duplicate code. e.g J Bloggs living at 1 Wicklow street and Joe
bloggs living at 1 wicklow street would be assigned "1" in their dup_code
field in each record. The next duplicate pair found would then be assigned
"2" and so on......
What I want to be able to do is, when a duplicate is found, transfer all of
the transactions from the duplicate client to the unique client so that the
transaction history is preserved. i.e J Bloggs transactions would be
assigned to Joe Bloggs instead. To do this the toro_clientid in the
transactions table needs to be updated to the toro_clientid of the unique
client. How do we know which client to use as the unique client I hear you
ask?? Well, there is a field in the client table called "Dateenrolled"
which tells you when the client enrolled at a particular store. So I want
to keep the client who has the most recent enrolled date (as this means the
address details will be most up to date) and discard the other client.
Now for the finale:
How the HELL do I do this in sql? Is it possible? And if so is it possible
to update all records automatically?

Any help would be very, very appreciated!!!

P.S I have over 80, 000 clients with over 2.5 million associated
transactions to check and de-dupe so you can see why I need to automate the
process.

Cheers

Paul


X002548
Not Just a Number

15586 Posts

Posted - 2003-01-30 : 17:07:50
Ok, it took a while but I think I have a solution for you...how it will work against 2.5 million is another story. There is one slight problem I realized when I was getting near the end was, what if a clientid has more than 1 duplicate? I think the code may handle it, but I'll have to test further.

Here you go and Good Luck

Brett

8-)

ps you should be able to cut and paste the example and execute it...


Create Table Client (Client_ID Int, Add_Dt Datetime, Dup_Id Int)
GO

Delete From Client
GO

Insert Into Client (Client_Id, Add_Dt, Dup_Id) Select 1,'2003-01-01', 1
Insert Into Client (Client_Id, Add_Dt) Select 2,'2003-01-01'
Insert Into Client (Client_Id, Add_Dt, Dup_Id) Select 3,'2003-02-01', 1
Insert Into Client (Client_Id, Add_Dt) Select 4,'2003-02-01'
Go

Select * from Client
Go

Create Table TRANS (TRANS_ID int, Client_Id int)
GO

Insert Into TRANS (TRANS_ID, Client_Id) Select 1,1
Insert Into TRANS (TRANS_ID, Client_Id) Select 2,2
Insert Into TRANS (TRANS_ID, Client_Id) Select 3,3
Insert Into TRANS (TRANS_ID, Client_Id) Select 4,4
Insert Into TRANS (TRANS_ID, Client_Id) Select 5,1
Insert Into TRANS (TRANS_ID, Client_Id) Select 6,2
Insert Into TRANS (TRANS_ID, Client_Id) Select 7,3
Insert Into TRANS (TRANS_ID, Client_Id) Select 7,3
GO
Select * from TRANS
GO
UPDATE T
SET T.Client_Id = XXX.USE_Client_ID
FROM TRANS T INNER JOIN
(SELECT DISTINCT
Case When a.Add_Dt > b.Add_Dt Then a.Client_Id Else b.Client_Id End AS USE_CLient_Id -- This is the Id to Change To
, Case When a.Add_Dt > b.Add_Dt Then b.Client_Id Else a.Client_Id End AS CHANGE_Client_Id -- This is the Id to Change From
FROM Client a INNER JOIN Client b
ON a.Dup_Id = b.Dup_Id
WHERE a.Client_Id <> b.Client_Id
AND EXISTS (SELECT * FROM CLIENT C WHERE a.Client_Id = c.Client_Id HAVING a.Add_Dt = Max(c.Add_Dt))) As XXX
ON T.CLIENT_ID = XXX.CHANGE_Client_Id

GO
Select * From TRANS
GO

Drop Table Client
GO
Drop Table TRANS
GO

Go to Top of Page

paul.rowling
Yak Posting Veteran

81 Posts

Posted - 2003-01-31 : 04:51:40
Thanks alot, I'll give it a try!!!

Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-01-31 : 11:21:05
I'll be curious to know how it works out...

Brett

8-)

Go to Top of Page
   

- Advertisement -