QUERY to find "non-identical" duplicates in my tab

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

QUERY to find "non-identical" duplicates in my tab

Author

Topic

amgrace
Starting Member

30 Posts

Posted - 2004-06-01 : 10:32:21

Hi,
I am trying to find rows in my SQL Server table where important details on duplicate records do not match each other (i.e MemberID is not Unique in this table as there could be several enries for one member, but for each of these duplicate, the FirstName and Surname should be the same).

So, If MemberID = 123
and Surname = SMITH
Then all other rows with MemberID 123 should have Surname SMITH.

Can you please help with this query. This table in populated manually and I need to find typo errors.
Thank you

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2004-06-01 : 10:37:36

[code]
select distinct memberid, firstname, surname from table
order by memberid
[/code]

amgrace
Starting Member

30 Posts

Posted - 2004-06-01 : 10:49:33

Thanks for the quick reply.
For non-identical duplicate records, I am atually looking for an output like this:

MemberID Surname
123 BROWN
123 BLACK
456 YELLOW
456 BLUE
456 ORANGE

OR

MemberID Surname
123 BROWN
BLACK
456 YELLOW
BLUE
ORANGE

Thank you.

amgrace
Starting Member

30 Posts

Posted - 2004-06-01 : 10:52:59

that sure didn't come out as i expected.........the spaces between the ouput seem to have dissapeared.....hope you still understand what am trying to achieve here.
thanks

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2004-06-01 : 10:54:27

Have you tried running the query? You will get non-identical 'duplicates', you will get them by firstname and surname, but you will get all other records as well... This is due to your design and letting users add other names to an existing memberid...

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-06-01 : 11:55:52

Yes the design is not that great.. but maybe this will help:

Declare @myTable table (memberId int, firstName nvarchar(20), surName nvarchar(20))
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Aldebol'
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Aldebol'
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Smith'
Insert Into @myTable Select memberId=2, firstName = 'John', surName = 'Smith'

Select
A.*
From
(Select distinct memberId, firstName, surName From @myTable) as A,
(Select distinct memberId, firstName, surName From @myTable) as B
Where A.memberId = B.memberId
and (A.firstName<>B.firstName or A.surName<>B.surName)

amgrace
Starting Member

30 Posts

Posted - 2004-06-02 : 06:55:48

Thanks for that reply. I know the design isn't great. We extract the data from a systems that enters them manually. Each member can have several approved entries on the system but each entry has entry has an assigned unique key on our DB.

I tried the code fron SeventhNight and it worked fine until I added some more lines to the insert (just to further test it):

Declare @myTable table (memberId int, firstName nvarchar(20), surName nvarchar(20))
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Aldebol'
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Aldebol'
Insert Into @myTable Select memberId=1, firstName = 'Corey', surName = 'Smith'
Insert Into @myTable Select memberId=2, firstName = 'John', surName = 'Smith'
Insert Into @myTable Select memberId=2, firstName = 'John', surName = 'Smith'
Insert Into @myTable Select memberId=2, firstName = 'Grass', surName = 'Smith'
Insert Into @myTable Select memberId=2, firstName = 'John', surName = 'Smith'
Insert Into @myTable Select memberId=1, firstName = 'Brown', surName = 'Aldebol'

The ouput now seems a bit confused i.e There are only 4 entries for member1 in @mytable but the ouput retured 6.

memberId firstName surName
1 Corey Aldebol
1 Corey Smith
1 Brown Aldebol
1 Corey Smith
1 Brown Aldebol
1 Corey Aldebol
2 John Smith
2 Grass Smith

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-06-02 : 07:17:24

Ooops...

add one more distinct to the outer most select:
Select Distinct
A.*
From
(Select distinct memberId, firstName, surName From @myTable) as A,
(Select distinct memberId, firstName, surName From @myTable) as B
Where A.memberId = B.memberId
and (A.firstName<>B.firstName or A.surName<>B.surName)

amgrace
Starting Member

30 Posts

Posted - 2004-06-02 : 08:13:51

WORKS PERFECTLY, thanks.

I replaced the @mytable with the name of my table and ran only the outer most select i.e

Select Distinct
A.*
From
(Select distinct MemberId, FirstName, SurName From Members) as A,
(Select distinct MemberId, FirstName, SurName From Members) as B
Where A.MemberId = B.MemberId
and (A.FirstName<>B.FirstName or A.SurName<>B.SurName)

but its been running for about 20 minutes now, with no sign of finishing soon!

As this is a very large database, would the runtime be faster if I create 2 views or tables A and B consisting of the reqired data (Select distinct memberId, firstName, surName from Members),

then SELECT INTO a totally different table (DuplicateMembers) the outcome of the 2 compared views/tables

thanks

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-06-02 : 08:46:57

Try creating a temp table (or table variable) for distinct members:

Declare @dMembers Table (memberId int, firstName nvarchar(20), surName nvarhcar(20))
Insert Into @dMemebers
Select Distinct MemberId, FirstName, SurName From Members

then run the filter select. it should work much better. I've never tried it with views, they may be fast enough, but I think that the temp table/variable table will be faster on the whole.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-06-02 : 08:51:00

Does this return what you are looking for?

select tbl.* from tbl
inner join
(select memberID from tbl group by memberID having count(*) > 1) b
on
tbl.memberID = b.memberID

- Jeff

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-06-02 : 09:20:39

Yeah I can never remember how the whole having thing works... that sounds a bit better to me

Corey

amgrace
Starting Member

30 Posts

Posted - 2004-06-03 : 06:58:45

Once again, thank you all for you help.
All the tips/codes worked - but they took a very long time to run. So, I deceded to go try the view route - I seem to get faster output.

WHAT I DID
1. Created 2 identical views of the required data.
2. A SELECT query to compare/filter the two views show the result.

I am thinking of creating a stored procedure for these processes - that will be under a different topic!!!!

thanks

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources