Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Elminiating duplicates

Author  Topic 

Abu-Dina
Posting Yak Master

206 Posts

Posted - 2008-06-25 : 09:05:47
Greetings all,

I have the following two tables:


if object_id('tempdb..##mYAddress') > 0
drop table ##mYAddress

create table ##mYAddress(
AddressId int identity(1, 1)
,Addr1 varchar(50)
,Addr2 varchar(50)
,Addr3 varchar(50)
,Town varchar(50)
,Postcode varchar(50)
,AddressScore tinyint
)

insert into ##mYAddress (Addr1, Addr2, Addr3, Town, Postcode, AddressScore)
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 5 union all
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 8 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 2 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 3

if object_id('tempdb..##myResidence') > 0
drop table ##myResidence

create table ##myResidence(
PersonId int identity(1, 2)
,AddressId int
)

insert into ##myResidence (AddressId)
select 1 union all
select 1 union all
select 2 union all
select 3 union all
select 4



If we run the following query:


select a.*, b.*
from ##myResidence as a
inner join ##mYAddress as b
on a.AddressId = b.AddressId


We can see that PersonId 1 and 5 have different AddressId but the actual address is exactly the same.

This is cause by the following duplicate records:


select a.AddressId, b.AddressId, a.Addr1, b.Addr1, a.Addr2, b.Addr2, a.Town, b.Town, a.Postcode, b.Postcode
from ##mYAddress a
join ##mYAddress b on a.Addr1 = b.Addr1
and isnull(a.Addr2,'xxx') = isnull(b.Addr2,'xxx')
and isnull(a.Addr3,'xxx') = isnull(b.Addr3,'xxx')
and a.Town = b.Town
and a.Postcode = b.Postcode
where a.AddressId <> b.AddressId and a.AddressId < b.AddressId


What I need to do is elminiate the duplicates from ##mYAddress table and only keep the record that has the lowest AddressId. I then need to update my ##myResidence table to update the records that have their AddressId deleted.

Can anyone help me?

Regards.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-25 : 09:17:53
[code]DECLARE @DeletedId table
(
AddressID int,
Addr1 varchar(50)
,Addr2 varchar(50)
,Addr3 varchar(50)
,Town varchar(50)
,Postcode varchar(50)
)

DELETE t
OUTPUT DELETED.AddressID,
DELETED.Addr1,
DELETED.Addr2,
DELETED.Addr3,
DELETED.Town,
DELETED.Postcode
INTO @DeletedId
FROM
(SELECT ROW_NUMBER() OVER(PARTITION BY Addr1, Addr2, Addr3, Town, Postcode ORDER BY AddressScore) AS RowNo,*
FROM ##mYAddress)t
WHERE t.RowNo=1


UPDATE t
SET t.AddressId=a.AddressId
FROM ##myResidence t
INNER JOIN @DeletedId d
ON d.AddressID=t.AddressID
INNER JOIN ##mYAddress a
ON a.Addr1 = d.Addr1
AND a.Addr2=d.Addr2
AND a.Addr3=d.Addr3
AND a.Town=d.Town
AND a.Postcode=d.Postcode[/code]
Go to Top of Page

Abu-Dina
Posting Yak Master

206 Posts

Posted - 2008-06-25 : 09:28:05
Thanks visakh16, this is a very nice solution but how can I keep the lowest AddressId and remove the higher valued ones? So for PersonId 1, 3 and 5 it would set it AdressId of 1 and not 2?

Thanks for your help in advance.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-25 : 09:34:49
quote:
Originally posted by Abu-Dina

Thanks visakh16, this is a very nice solution but how can I keep the lowest AddressId and remove the higher valued ones? So for PersonId 1, 3 and 5 it would set it AdressId of 1 and not 2?

Thanks for your help in advance.


it will keep lowest Address Id provided your addressscore is coming alphabetically or you can even do this

DELETE t
OUTPUT DELETED.AddressID,
DELETED.Addr1,
DELETED.Addr2,
DELETED.Addr3,
DELETED.Town,
DELETED.Postcode
INTO @DeletedId
FROM
(SELECT ROW_NUMBER() OVER(PARTITION BY Addr1, Addr2, Addr3, Town, Postcode ORDER BY AddressId) AS RowNo,*
FROM ##mYAddress)t
WHERE t.RowNo=1


however, i didint understand how you'll get adressid 1 for personids 1,3 & 5. what i'm doing is taking those address that were deleted by joining to @deletedID and then joining onto ##mYAddress to get id that's remaining with same postcode,addr1,addr2,... value.so according to my logic i think you will update reord with address id 3 to 2 in ##myResidence table.rest all remains same.
Go to Top of Page

Abu-Dina
Posting Yak Master

206 Posts

Posted - 2008-06-25 : 09:40:06
I tried adding the order by clause as you have put it but it still doesn't work. For example, PersonId 9 should remain set to Address Id 4 and not 5.

Actually I think its the last update that doesn't work.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-25 : 12:48:57
quote:
Originally posted by Abu-Dina

I tried adding the order by clause as you have put it but it still doesn't work. For example, PersonId 9 should remain set to Address Id 4 and not 5.

Actually I think its the last update that doesn't work.


b/w i just had a look at your intial post again. there you have specified
PersonId 1 and 5 have different AddressId (1 & 4) but the actual address is exactly the same.

but i dont think its true. your addres values are

insert into ##mYAddress (Addr1, Addr2, Addr3, Town, Postcode, AddressScore)
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 5 union all
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 8 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 2 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 3


then how do you think address id 1 (ROYAL AIR FORCE) & 4(54 BATEMAN CLOSE') are same? can you explain?
Go to Top of Page

Abu-Dina
Posting Yak Master

206 Posts

Posted - 2008-06-26 : 09:37:40
quote:
Originally posted by visakh16

quote:
Originally posted by Abu-Dina

I tried adding the order by clause as you have put it but it still doesn't work. For example, PersonId 9 should remain set to Address Id 4 and not 5.

Actually I think its the last update that doesn't work.


b/w i just had a look at your intial post again. there you have specified
PersonId 1 and 5 have different AddressId (1 & 4) but the actual address is exactly the same.

but i dont think its true. your addres values are

insert into ##mYAddress (Addr1, Addr2, Addr3, Town, Postcode, AddressScore)
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 5 union all
select 'ROYAL AIR FORCE', 'CRINKLY BOTTOM', 'CRINKLINGTON', 'CRINKLINGTONSHIRE', 'CRI L1N', 8 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 2 union all
select '54 BATEMAN CLOSE', 'LARKING', 'LARKINGHAM', 'LARKINGHAMSHIRE', 'LAR K1B ', 3


then how do you think address id 1 (ROYAL AIR FORCE) & 4(54 BATEMAN CLOSE') are same? can you explain?



Thanks Visakh!!

I modified the code slightly and this is my working version:


DELETE t
OUTPUT DELETED.DomicileId,
DELETED.DD_Addr1,
DELETED.DD_Addr2,
DELETED.DD_Addr3,
DELETED.DD_Town,
DELETED.DD_Postcode
INTO dbo.AA_DuplicateDomicile
FROM
(SELECT ROW_NUMBER() OVER(PARTITION BY DD_Addr1, isnull(DD_Addr2, 'XXX'), isnull(DD_Addr3, 'XXX'),
DD_Town, DD_Postcode ORDER BY DomicileId) AS RowNo,*
FROM TheTradingFloor.dbo.Domicile)t
WHERE t.RowNo>1
Go to Top of Page
   

- Advertisement -