| Author |
Topic |
|
Cwm
Starting Member
39 Posts |
Posted - 2008-12-15 : 16:33:12
|
| hiI was wondering how i go to delete duplicate data when i go to insert a bunch of data from a csv file?The reason is becuase i run a timer to read an appended text file, seperated by comma, i will get duplicate data that i want to make sure tha it doesn't get inserted again into the table.ThanksChris |
|
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2008-12-15 : 16:38:38
|
| Load it into staging table,perform delete duplicate operation and load into SQL Destination table.See if in SQL 2005:http://sqlis.com/post/Get-all-from-Table-A-that-isnt-in-Table-B.aspx |
 |
|
|
Cwm
Starting Member
39 Posts |
Posted - 2008-12-15 : 17:29:24
|
| Hi,That article looks confusing. Is there an easier way?Thanks |
 |
|
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2008-12-15 : 17:35:24
|
| How your data look like after loading into staging table? Can you give sample data and output? |
 |
|
|
revdnrdy
Posting Yak Master
220 Posts |
Posted - 2008-12-15 : 18:28:24
|
| Here is how I do it (our database is not normalized btw). This presumes you have a primary ID key (or unique ID key) establised.This query deletes duplicate data based on 2 matching fields (manufacturer and partnumber).This is also known as a self-join operation (I think)DELETE FROM ProductsTable WHERE ID IN(select a.ID from ProductsTABLE a,ProductsTable b where a.manufacturer=b.manufacturer And a.partnumber=b.partnumber And a.ID<b.ID) |
 |
|
|
Cwm
Starting Member
39 Posts |
Posted - 2008-12-15 : 19:27:56
|
| Thanks revdnrdy,I have found something similar to yours, and it isselect distinct TicketNumber into Newtable from [CompletedOrders]delete T1from [CompletedOrders] T1, [CompletedOrders] T3where T1.TicketNumber = T3.TicketNumberand T1.COID > T3.COIDActually it looks pretty identical to what yours is, all i am doing is deleting the duplicates based on two matching fields, the TicketNumber and the autoincrement id.ThanksChris |
 |
|
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2008-12-15 : 19:44:49
|
| Test it before you apply with select:Delete tfrom(select row_number() over (partition by TicketNumber order by id)as seq,* from table)twhere t.seq>1 |
 |
|
|
Jai Krishna
Constraint Violating Yak Guru
333 Posts |
Posted - 2008-12-15 : 23:52:39
|
| Try thisDELETE FROM your_tableWHERE autoincrementid NOT IN (SELECT MAX(autoincrementid) FROM your_table GROUP BY TicketNumber)Jai Krishna |
 |
|
|
NeilG
Aged Yak Warrior
530 Posts |
Posted - 2008-12-16 : 10:04:34
|
| Cwm, Either are the same but try to avoid the distinct clause where you can but not vital, many people don't like this being used on large table for perfomance reasons |
 |
|
|
|