Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Deleting Duplicate Records

Author  Topic 

TRACS
Starting Member

5 Posts

Posted - 2009-07-11 : 18:21:45
I have a table that has a primary key that has duplicate records. I am trying to clean up this table so that I can put an identity index on the PK and fix this once and for all. There is a LAST_UPDATE_DT (datetime field) that is changed any time any part of the record is changed. I need to delete all of the duplicates and leave the record with that newest Last_Updated Date? All of the posts I have found are for records that are exactly the same (true duplicates) where my records are actually unique but have the same Primary Key value. I need to keep the record that has the latest changes. Your help is greatly appreciated for this rookie to the SQL Server world. Please don't assume that I know anything in your explanation.

Table Structure:
NRR_ID PK
NRR_TITLE varchar
NRR_CREATE_DT datetime
NRR_NARRATIVE varchar(max)
LAST_UPDATE_DT datetime

Thanks,
TRACS

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2009-07-11 : 18:37:17
delete t
from table t
join
(select
nrr_id,
last_update_dt,
row_number() over (partition by nrr_id order by last_update_dt desc) as rownum
)t2
on t.nrr_id=t2.nrr_id and t.last_update_dt = t2.last_update_dt and t2.rownum > 1


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

TRACS
Starting Member

5 Posts

Posted - 2009-07-11 : 18:55:34
Webfred,

Thanks for the quick response. I have a couple of questions about your code that I'd appreciate an explanation of:

1. You wrote: "Delete T" I assume that I should write "Delete SDC_NARRATIVES" since this is the name of the Table
2. You wrote: "from Table t" Again I should write "FROM SDC_NARRATIVES"
3. You used row_number() over (partition by nrr_id order by last_update_dt desc) as rownum)t2 If I get this right when you use "t2" that is an alias for a temporary table? I see these aliases in alot of code posts and assume that it just makes it easier to write in response to people's questions without having to use their actual table names? If not than please explain.
4. I want to delete all the rows with the same NRR_ID but not the row that has the last changes (as determined by the newest last_update_dt field. This is what your query does?

I appreciate your simple explanation of your code so i can learn and understand this so I can use it in the future for other things. THe row_number() is a function that will give the records a unique value for deleting purposes later? and the Partition by function somehow says: "give me the record by nrr_id with the newest last_update_dt and then delete all the other records with the same nrr_id. Right? If so this is exactly what I need. In your join you don't list all the fields in the table. Is the delete accomplished on the whole row by the delete statement at the top? Thanks again for your help with this problem.

TRACS
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-07-12 : 04:11:41
1. T is table alias. its not used as a short name for tables so that you dont have to repeat long table names everywhere in query. you can use them as it is given in actual query also.Your assumption on row_number and partition is correct. it will group data within table by partition by fields nrr_id and give values to each group based on order in order by (last_update_dt desc) so that latest record will 1 and going backwards 2,3,4,.. so when you delete > 1 it will delete all except latest one for each NRR_ID group.you dont have to specify column names in delete as you always delete as a row rather than individual columns.
so you query should be something like

delete t
from SDC_NARRATIVES t
left join
(select
nrr_id,
row_number() over (partition by nrr_id order by last_update_dt desc) as rownum
from SDC_NARRATIVES
)t2
on t.nrr_id=t2.nrr_id
and t.last_update_dt = t2.last_update_dt
and t2.rownum = 1
where t2.nrr_id is null

before you apply this make sure you replace delete t in above with select t.* and see the records that you're going to delete to make sure thats exactly what you want
Go to Top of Page
   

- Advertisement -