De-duping table - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

De-duping table

Author

Topic

GhantaBro
Posting Yak Master

215 Posts

Posted - 2009-08-20 : 06:24:50

I have a table with 4 billion records which I need to dedupe... (not many duplicates though)..

which is more efficient..

Select distinct * into new_tbl from old_table
drop old_table
rename new_tbl

or

deduping is faster by deleting duplicate records from the table?

Thanks!

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2009-08-20 : 06:54:54

i can guess which is less disk intensive

http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx
How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx
For ultra basic questions, follow these links.
http://www.sql-tutorial.net/
http://www.firstsql.com/tutor.htm
http://www.w3schools.com/sql/default.asp

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2009-08-20 : 09:39:17

If depends, but if the number of dupes is small relative to the size of the table, delete will be faster.

It will probably be more work to find the dupes than to delete them.

CODO ERGO SUM

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2009-08-20 : 11:05:55

Either way you do it, make sure you implement a way to PREVENT dupes from getting in there!

http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx
How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx
For ultra basic questions, follow these links.
http://www.sql-tutorial.net/
http://www.firstsql.com/tutor.htm
http://www.w3schools.com/sql/default.asp

shebert
Yak Posting Veteran

85 Posts

Posted - 2009-08-20 : 14:27:08

One thing you can do is find the duplicates and copy single instance into a temp table.
with query like this

select assembly_lot_id, count(assembly_lot_id) as occurs
from assembly_lot group by assembly_lot_id having (count(assembly_lot_id)>1)

then delete the duplicate records in your table
then replace the single records back in unduplicated

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources