Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Trouble querying for duplicates

Author  Topic 

indio
Starting Member

7 Posts

Posted - 2007-03-13 : 10:27:27
Hi,
I am trying to query for duplicate records. The fields that I am comparing is coming from two diferent different tables.

Example:

Table : Member

DOB SSN Phone FullName entid Member No
-------------------------------------------------------------------------------------------------
01/02/2007 123456789 281-098-1234 John Original Smith 01 A1

01/02/2007 123456789 713-098-1234 John Duplicate Smith 02 B2

06-02-1964 987654321 817-098-6754 Paul Original Harris 03 A3




Table: Entity

firstname MiddleName lastname City entid
------------------------------------------------------------------------
John Original Smith Phoenix 01

John Duplicate Smith Phoenix 02

Paul Original Harris Dallas 03




Desired Output:

Member No(original) Member No (duplicate) DOB SSN firstname lastname
----------------------------------------------------------------------------------------------------

A1 B2 01/02/2007 123456789 John Smith



I need to compare on fields DOB, SSN, 1st 3 chars of firstname and 1st 3 chars of last name. I am able to bring out the duplicates on each table when I query like this:

Step 1:

select * from member x

join (

select dob, ssn

from member group by dob, ssn

having count (*) > 1)y

on x.dob=y.dob and x.ssn=y.ssn



But, I am lost on how to join the 2 tables and when I query using Substring on the name. I cannot join the tables using UNION. How do I go about this ? Can anyone help ?

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2007-03-13 : 13:33:33
You can group and join on the substrings

select * from member x

join (

select dob, ssn,
left(firstname, 3) as firstname,
left(lastname, 3) as lastname

from member group by dob, ssn, left(firstname, 3), left(lastname, 3)

having count (*) > 1)y

on x.dob=y.dob and x.ssn=y.ssn
and left(x.firstname, 3)=y.firstname
and left(x.lastname, 3)=y.lastname
Go to Top of Page

indio
Starting Member

7 Posts

Posted - 2007-03-14 : 14:27:55
snSQL:

Thanks for the response. I tried a similar way with a slight modification and got 2 duplicate records; but the part I am really stuck is with the logic for the final output.


Example in this case:

We know that John Original Smith and John Duplicate Smith are 2 records for the same person as he has 2 different member numbers. After finding out this I want to get the data in the following format in only one record:


Member No ( Original), Member No (Duplicate), Firstname (Original), LastName(Original), SSN (Original).

All that I need in the output is to find out the member no for John Duplicate Smith and add it to the record for John Original Smith.

In this case it should read:

A1, B2, John, Smith, 123456789

Any suggestions ?

Thank You !
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2007-03-14 : 15:57:11
This should give you exactly that
SELECT m1.[Member No], m2.[Member No], e1.firstname, e1.lastname, m1.SSN
FROM Member m1
INNER JOIN Member m2 ON m1.SSN = m2.SSN AND m1.DOB = m2.DOB
INNER JOIN Entity e1 ON e1.entid = m1.entid
INNER JOIN Entity e2 ON e2.entid = m2.entid
WHERE left(e1.firstname, 3) = left(e2.firstname, 3)
AND left(e1.lastname, 3) = left(e2.lastname, 3)
AND m1.[Member No] < m2.[Member No]
Go to Top of Page
   

- Advertisement -