Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 How do i delete duplicate data?

Author  Topic 

Cwm
Starting Member

39 Posts

Posted - 2008-12-15 : 16:33:12
hi
I was wondering how i go to delete duplicate data when i go to insert a bunch of data from a csv file?
The reason is becuase i run a timer to read an appended text file, seperated by comma, i will get duplicate data that i want to make sure tha it doesn't get inserted again into the table.

Thanks

Chris

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2008-12-15 : 16:38:38
Load it into staging table,perform delete duplicate operation and load into SQL Destination table.

See if in SQL 2005:
http://sqlis.com/post/Get-all-from-Table-A-that-isnt-in-Table-B.aspx
Go to Top of Page

Cwm
Starting Member

39 Posts

Posted - 2008-12-15 : 17:29:24
Hi,
That article looks confusing. Is there an easier way?

Thanks
Go to Top of Page

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2008-12-15 : 17:35:24
How your data look like after loading into staging table? Can you give sample data and output?
Go to Top of Page

revdnrdy
Posting Yak Master

220 Posts

Posted - 2008-12-15 : 18:28:24
Here is how I do it (our database is not normalized btw). This presumes you have a primary ID key (or unique ID key) establised.

This query deletes duplicate data based on 2 matching fields (manufacturer and partnumber).

This is also known as a self-join operation (I think)

DELETE
FROM ProductsTable
WHERE ID
IN(select a.ID
from ProductsTABLE a,ProductsTable b
where a.manufacturer=b.manufacturer
And a.partnumber=b.partnumber
And a.ID<b.ID)
Go to Top of Page

Cwm
Starting Member

39 Posts

Posted - 2008-12-15 : 19:27:56
Thanks revdnrdy,
I have found something similar to yours, and it is

select distinct TicketNumber into Newtable from [CompletedOrders]
delete T1
from [CompletedOrders] T1, [CompletedOrders] T3
where T1.TicketNumber = T3.TicketNumber
and T1.COID > T3.COID

Actually it looks pretty identical to what yours is, all i am doing is deleting the duplicates based on two matching fields, the TicketNumber and the autoincrement id.

Thanks

Chris
Go to Top of Page

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2008-12-15 : 19:44:49
Test it before you apply with select:

Delete t
from
(select row_number() over (partition by TicketNumber order by id)as seq,* from table)t
where t.seq>1
Go to Top of Page

Jai Krishna
Constraint Violating Yak Guru

333 Posts

Posted - 2008-12-15 : 23:52:39
Try this

DELETE FROM your_table
WHERE autoincrementid NOT IN (SELECT MAX(autoincrementid) FROM your_table GROUP BY TicketNumber)

Jai Krishna
Go to Top of Page

NeilG
Aged Yak Warrior

530 Posts

Posted - 2008-12-16 : 10:04:34
Cwm,
Either are the same but try to avoid the distinct clause where you can but not vital, many people don't like this being used on large table for perfomance reasons
Go to Top of Page
   

- Advertisement -