Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 De-duping table

Author  Topic 

GhantaBro
Posting Yak Master

215 Posts

Posted - 2009-08-20 : 06:24:50
I have a table with 4 billion records which I need to dedupe... (not many duplicates though)..

which is more efficient..

Select distinct * into new_tbl from old_table
drop old_table
rename new_tbl

or


deduping is faster by deleting duplicate records from the table?

Thanks!

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2009-08-20 : 06:54:54
i can guess which is less disk intensive

http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx
How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

For ultra basic questions, follow these links.
http://www.sql-tutorial.net/
http://www.firstsql.com/tutor.htm
http://www.w3schools.com/sql/default.asp
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2009-08-20 : 09:39:17
If depends, but if the number of dupes is small relative to the size of the table, delete will be faster.

It will probably be more work to find the dupes than to delete them.


CODO ERGO SUM
Go to Top of Page

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2009-08-20 : 11:05:55
Either way you do it, make sure you implement a way to PREVENT dupes from getting in there!

http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx
How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

For ultra basic questions, follow these links.
http://www.sql-tutorial.net/
http://www.firstsql.com/tutor.htm
http://www.w3schools.com/sql/default.asp
Go to Top of Page

shebert
Yak Posting Veteran

85 Posts

Posted - 2009-08-20 : 14:27:08
One thing you can do is find the duplicates and copy single instance into a temp table.
with query like this

select assembly_lot_id, count(assembly_lot_id) as occurs
from assembly_lot group by assembly_lot_id having (count(assembly_lot_id)>1)


then delete the duplicate records in your table
then replace the single records back in unduplicated
Go to Top of Page
   

- Advertisement -