Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Delete duplicate rows

Author  Topic 

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 11:10:33
I have the following few rows in the table and need to delete the duplicate rows. The data looks like this.

Row Log_id product created_date
1 INC1758837 1-Gen 11/4/2008 22:10
2 INC1332047 AHRS-G 11/1/2008 16:28
3 INC1765192 AHRS-G 11/5/2008 21:59
4 INC1765192 AHRS-G 11/6/2008 22:30
5 INC1772344 1-Gen 11/9/2008 11:03

The duplicates are in rows 3 and 4. I need to return rows 1, 2, 4 and 5. The only difference between the rows 3 and 4 is the created_date. I need to return the highest date in case there are more then one duplicate rows in this table.

acollins74
Yak Posting Veteran

82 Posts

Posted - 2008-11-12 : 11:15:54
There are several ways to do it; most more complex than the next but this is my way and seems to be the most staight forward.

delete a
from tablename a
inner join tablename b
on a.log_id = b.log_id and a.product = b.product
where a.create_date < b.create_date
Go to Top of Page

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 11:27:03
What if there are more then two instances of the same row in there? Such are the row 6 in my sample data bellow. In which case i need to return rows 1, 2, 5 and 6 because the created_date on row 6 is bigger then the created_date in rows 3 and 4 of the same ticket number.

Row Log_id product created_date
1 INC1758837 1-Gen 11/4/2008 22:10
2 INC1332047 AHRS-G 11/1/2008 16:28
3 INC1765192 AHRS-G 11/5/2008 21:59
4 INC1765192 AHRS-G 11/6/2008 22:30
5 INC1772344 1-Gen 11/9/2008 11:03
6 INC1765192 AHRS-G 11/9/2008 11:10
Go to Top of Page

raky
Aged Yak Warrior

767 Posts

Posted - 2008-11-12 : 11:27:44
If u are using sqlserver 2005 then u can use this

CREATE TABLE #T1 ( row int, log_id varchar(64), product varchar(64), created_Date datetime )
insert into #t1
select 1, 'INC1758837', '1-Gen', '11/4/2008 22:10' union all
select 2, 'INC1332047', 'AHRS-G', '11/1/2008 16:28' union all
select 3, 'INC1765192', 'AHRS-G', '11/5/2008 21:59' union all
select 4, 'INC1765192', 'AHRS-G', '11/6/2008 22:30' union all
select 5, 'INC1772344', '1-Gen', '11/9/2008 11:03'


delete d
from #t1 d
left join ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno, row
from #t1 ) q on q.row = d.row and q.sno = 1
where q.row is null
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-12 : 11:32:08
quote:
Originally posted by raky

If u are using sqlserver 2005 then u can use this

CREATE TABLE #T1 ( row int, log_id varchar(64), product varchar(64), created_Date datetime )
insert into #t1
select 1, 'INC1758837', '1-Gen', '11/4/2008 22:10' union all
select 2, 'INC1332047', 'AHRS-G', '11/1/2008 16:28' union all
select 3, 'INC1765192', 'AHRS-G', '11/5/2008 21:59' union all
select 4, 'INC1765192', 'AHRS-G', '11/6/2008 22:30' union all
select 5, 'INC1772344', '1-Gen', '11/9/2008 11:03'


delete d
from #t1 d
left join ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno, row
from #t1 ) q on q.row = d.row and q.sno = 1
where q.row is null


no need og join. the below will do

delete q
from ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno,*
from #t1 ) q
where q.sno > 1

Go to Top of Page

raky
Aged Yak Warrior

767 Posts

Posted - 2008-11-12 : 11:35:35
quote:
Originally posted by visakh16

quote:
Originally posted by raky

If u are using sqlserver 2005 then u can use this

CREATE TABLE #T1 ( row int, log_id varchar(64), product varchar(64), created_Date datetime )
insert into #t1
select 1, 'INC1758837', '1-Gen', '11/4/2008 22:10' union all
select 2, 'INC1332047', 'AHRS-G', '11/1/2008 16:28' union all
select 3, 'INC1765192', 'AHRS-G', '11/5/2008 21:59' union all
select 4, 'INC1765192', 'AHRS-G', '11/6/2008 22:30' union all
select 5, 'INC1772344', '1-Gen', '11/9/2008 11:03'


delete d
from #t1 d
left join ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno, row
from #t1 ) q on q.row = d.row and q.sno = 1
where q.row is null


no need og join. the below will do

delete q
from ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno,*
from #t1 ) q
where q.sno > 1





thanx for optimization..one more change even * is not required

delete q
from ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno,*
from #t1 ) q
where q.sno > 1

Go to Top of Page

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 11:39:47
What if there is no row number column in there? How does the query change?

Log_id product created_date
INC1758837 1-Gen 11/4/2008 22:10
INC1332047 AHRS-G 11/1/2008 16:28
INC1765192 AHRS-G 11/5/2008 21:59
INC1765192 AHRS-G 11/6/2008 22:30
INC1772344 1-Gen 11/9/2008 11:03
INC1765192 AHRS-G 11/9/2008 11:10
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-12 : 11:43:32
quote:
Originally posted by Almir_7

What if there is no row number column in there? How does the query change?

Log_id product created_date
INC1758837 1-Gen 11/4/2008 22:10
INC1332047 AHRS-G 11/1/2008 16:28
INC1765192 AHRS-G 11/5/2008 21:59
INC1765192 AHRS-G 11/6/2008 22:30
INC1772344 1-Gen 11/9/2008 11:03
INC1765192 AHRS-G 11/9/2008 11:10


the row_number function generates the number dynamically. No need of presence of seperate column in table with row number value.

Refer below for uses of row_number function

http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx
Go to Top of Page

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 11:44:35
Will this function work in SQL 2000? I'm on SQL 2000 database.
Go to Top of Page

raky
Aged Yak Warrior

767 Posts

Posted - 2008-11-12 : 11:45:31
quote:
Originally posted by Almir_7

What if there is no row number column in there? How does the query change?

Log_id product created_date
INC1758837 1-Gen 11/4/2008 22:10
INC1332047 AHRS-G 11/1/2008 16:28
INC1765192 AHRS-G 11/5/2008 21:59
INC1765192 AHRS-G 11/6/2008 22:30
INC1772344 1-Gen 11/9/2008 11:03
INC1765192 AHRS-G 11/9/2008 11:10



for the below solution to work just u need log_id, product,created_date columns.The query works eventhough row column doesn't exist


delete q
from ( select row_number() over ( partition by log_id, product order by created_date desc ) as sno
from #t1 ) q
where q.sno > 1
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-12 : 11:48:05
quote:
Originally posted by Almir_7

Will this function work in SQL 2000? I'm on SQL 2000 database.


Nope. it wont. then use the below

DELETE t
FROM YourTable t
LEFT JOIN (SELECT Log_id,product,MAX(created_date) AS maxdate
FROM YourTable
GROUP BY Log_id,product) t1
ON t1.Log_id=t.Log_id
AND t1.product=t.product
AND t1.maxdate=t.created_date
WHERE t1.Log_id IS NULL
Go to Top of Page

raky
Aged Yak Warrior

767 Posts

Posted - 2008-11-12 : 11:57:40
delete d
from table d
inner join ( select log_id,product,max(created_Date) as cd from table
group by log_id, product ) q on q.log_id = d.log_id and q.product = d.product and q.cd > d.created_date
Go to Top of Page

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 12:49:59
Thanks visakh16. Your query works. The problem is i have more then just these 3 columns and when i plug them in all the rows still get returned because i'm not able to group by just the log_id and product anymore. And if i plug in all the columns as bellow then all the rows get returned.

SELECT Log_id, Opened_date, closed_date, product_code, specialist_name, customer_name, customer_email, short_description, client_name, call_center, lang_code, family, do_not_survey, employee_id, enterprise_id, customer_country, customer_region, support_organization, assignee_support_group, BPO, Site, Customer_Site, Tier1, Tier2, Tier3, Resolution_category, entry_id, Audit_created_by, audit_status_new, audit_created_date, MAX(Audit_created_date) AS maxdate
FROM dbo.WS_Survey_Staging_DB
GROUP BY Log_id, Opened_date, closed_date, product_code, specialist_name, customer_name, customer_email, short_description, client_name, call_center, lang_code, family, do_not_survey, employee_id, enterprise_id, customer_country, customer_region, support_organization, assignee_support_group, BPO, Site, Customer_Site, Tier1, Tier2, Tier3, Resolution_category, entry_id, Audit_created_by, audit_status_new, audit_created_date

How can i work around this problem?
Go to Top of Page

Almir_7
Starting Member

14 Posts

Posted - 2008-11-12 : 12:58:39
Alright so this does work.

DELETE stage
FROM dbo.WS_Survey_Staging_DB stage
LEFT JOIN (SELECT Log_id,product_code,MAX(audit_created_date) AS maxdate
FROM dbo.WS_Survey_Staging_DB
GROUP BY Log_id,product_code) temp_stage
ON temp_stage.Log_id=stage.Log_id
AND temp_stage.product_code =stage.product_code
AND temp_stage.maxdate=stage.audit_created_date
WHERE temp_stage.Log_id IS NULL

Because the table i join on has only the two columns that are required and the max date and then once i execute the left join on all the columns and the delete then takes care of the rows that don't have a matching row from the other table.

Awsome. Thanks for all your help.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-12 : 13:02:31
You're welcome
Go to Top of Page
   

- Advertisement -