Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Deduping very large table

Author  Topic 

shawnafisher
Starting Member

5 Posts

Posted - 2009-04-21 : 17:48:31
Good afternoon all,

I am working on a script that will remove duplicate records from a very large table (40 mil + records). It is taking forever to accomplish, and I have two questions:
1. Does anyone know of a better way than my query below to accomplish this?
2. Could I add indexes to my _DeDup table to help?

Any guidance would be appreciated!


------------------------------------------------------------------------------------------
--DEDUP tDuplicatesExist
------------------------------------------------------------------------------------------
--Back up the data
Select * into MyDatabase.dbo.tDuplicatesExist_090421 from tDuplicatesExist<br/>
--get data to dedup
Select * into tDuplicatesExist_dedup from tDuplicatesExist--Truncate the original table to insert distinct records back in
Truncate table tDuplicatesExist SET IDENTITY_INSERT MyDatabase.dbo.tDuplicatesExist ON --Keep orig PK/RecordNumber

Insert tDuplicatesExist
Select tDuplicatesExist_DeDup.RecordNumber, --Pk/Unique Identifier
tDuplicatesExist_DeDup.EmployeeID,
tDuplicatesExist_DeDup.AccountID,
tDuplicatesExist_DeDup.TransactionType,
tDuplicatesExist_DeDup.TransactionData, tDuplicatesExist_DeDup.TransactionDate
from tDuplicatesExist
JOIN
( Select distinct EmployeeID,
AccountID,
TransactionType,
convert(datetime,convert(varchar(10),TransactionDate,101))TDate,
Max(RecordNumber) RecordNumber
From tDuplicatesExist_DeDup d
Group by
EmployeeID, AccountID, TransactionType,
convert(datetime,convert(varchar(10),TransactionDate,101)
) as A
on tDuplicatesExist_DeDup.RecordNumber = A.RecordNumber

SET IDENTITY_INSERT MyDatabase.dbo.tDuplicatesExist OFF

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2009-04-21 : 17:54:00
Can you delete in batches ?
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2009-04-22 : 04:51:23
I'd be interested to know if this:

(a) works
(b) performs


CREATE TABLE #delList (
[recordNumber] INT PRIMARY KEY
)

INSERT INTO #delList ([recordNumber])
SELECT
dups.[rn]
FROM

(
SELECT
[pos] = ROW_NUMBER() OVER(
PARTITION BY
[EmployeeID]
, [AccountID]
, [TransactionType]
, [TransactionData]
ORDER BY
[RecordNumber])
, [rn] = [recordNumber]
FROM
tDuplicatesExist
)
dups
WHERE
dups.[pos] > 1
GO

DECLARE @rowcount INT
SET @rowcount = 1

WHILE @rowcount > 0 BEGIN
DELETE TOP (50) tde
FROM
tDuplicatesExist tde
JOIN #delList dl ON dl.[recordNumber] = tde.[recordNumber]

SET @rowcount = @@ROWCOUNT
END



Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-04-22 : 04:54:14
First we must now the ratio of the deleted records vs total numbers of records.
If the ratio is high, it's better to create a new table and only insert the valid records, drop the original table and then rename the new table to the name of old table.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page
   

- Advertisement -