Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Delete Duplicates on a Table T-SQL

Author  Topic 

vicpal25
Starting Member

21 Posts

Posted - 2008-01-30 : 11:55:33
I have a table called Engr_BOM_Detail and my data is all screwed up. There are duplicate records coming from other tables. I want to delete these records and the only way to do so would be to target those records by the Engr_BOM_Detail's primary key field which is pEngr_BOM_Detail_ID. I have written the SQL that finds the duplicates on my table now, how can I take those records, identify them by their primary key and delete all and leaving only one entity of each record in my table. I was thinking of dropping these into a temp table but the primary key allocation is going to be the trick.


SELECT fEngr_Item_ID as "Item ID",
COUNT(fEngr_Item_ID) AS NumOccurrences
FROM Engr_BOM_Detail
GROUP BY fEngr_Item_ID, fEngr_BOM_ID
HAVING COUNT(fEngr_Item_ID) > 1




Item ID NumOccurrences
--------------------------------------- --------------
5349 2


if we go to the actual table using:




select pEngr_BOM_Detail_ID, fEngr_BOM_ID, fEngr_Item_ID from Engr_BOM_Detail where fEngr_Item_id=5349



pEngr_BOM_Detail_ID fEngr_BOM_ID Engr_Item_ID
------------------------- ---------------- -----------------
7084 1278 5349
7085 1279 5349
7086 1279 5349
7087 1278 5349
7088 1255 5349
7089 1246 5349


suggestions?

jdaman
Constraint Violating Yak Guru

354 Posts

Posted - 2008-01-30 : 12:26:42
I would suggest thoroughly testing this code against your table and making a copy of your table in its current state before running the code.
IF OBJECT_ID('tempdb..#Engr_BOM_Detail') IS NOT NULL 
DROP TABLE #Engr_BOM_Detail

CREATE TABLE #Engr_BOM_Detail (pEngr_BOM_Detail_ID INT PRIMARY KEY, fEngr_BOM_ID INT, Engr_Item_ID INT)
INSERT #Engr_BOM_Detail (pEngr_BOM_Detail_ID, fEngr_BOM_ID, Engr_Item_ID)
SELECT 7084, 1278, 5349 UNION
SELECT 7085, 1279, 5349 UNION
SELECT 7086, 1279, 5349 UNION
SELECT 7087, 1278, 5349 UNION
SELECT 7088, 1255, 5349 UNION
SELECT 7089, 1246, 5349

SELECT * FROM #Engr_BOM_Detail

SELECT pEngr_BOM_Detail_ID, fEngr_BOM_ID, Engr_Item_ID
INTO #TempTable
FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY Engr_Item_ID, fEngr_BOM_ID ORDER BY pEngr_BOM_Detail_ID ) AS RowNumber,
pEngr_BOM_Detail_ID, fEngr_BOM_ID, Engr_Item_ID
FROM #Engr_BOM_Detail
) a
WHERE RowNumber = 1

TRUNCATE TABLE #Engr_BOM_Detail

INSERT #Engr_BOM_Detail
SELECT *
FROM #TempTable

DROP TABLE #TempTable

SELECT * FROM #Engr_BOM_Detail
Go to Top of Page

dnf999
Constraint Violating Yak Guru

253 Posts

Posted - 2008-01-30 : 12:33:31
Hi Vicpal

When dealing with dupes, this is the usual approach I go for:

--Create Unique ID
ALTER TABLE Engr_BOM_Detail
ADD SRN numeric identity (1,1)

--Select Max(Unique_ID) for each pEngr_BOM_Detail_ID
SELECT a.* FROM Engr_BOM_Detail a
INNER JOIN
(
SELECT pEngr_BOM_Detail_ID, MAX(SRN) as SRN
FROM Engr_BOM_Detail
GROUP BY pEngr_BOM_Detail_ID
) b
on a.SRN = b.SRN

This should return only the distinct rows.

So basically I add a unique id field for every value in the table and then group by on the dupe field i want and select the maximum SRN associated with that dupe field.


Hope this can help.

Go to Top of Page

vicpal25
Starting Member

21 Posts

Posted - 2008-01-30 : 14:00:28
Forgot to mention i am using sql server 2000 so row_number will not work. ANy replacements? Also, this line :

SELECT 7084, 1278, 5349 UNION
SELECT 7085, 1279, 5349 UNION
SELECT 7086, 1279, 5349 UNION
SELECT 7087, 1278, 5349 UNION
SELECT 7088, 1255, 5349 UNION
SELECT 7089, 1246, 5349

its hard coded there are 1200 records with multiple values in my table
Go to Top of Page

jdaman
Constraint Violating Yak Guru

354 Posts

Posted - 2008-01-30 : 16:22:11
quote:
Originally posted by vicpal25

Forgot to mention i am using sql server 2000 so row_number will not work. ANy replacements? Also, this line :

SELECT 7084, 1278, 5349 UNION
SELECT 7085, 1279, 5349 UNION
SELECT 7086, 1279, 5349 UNION
SELECT 7087, 1278, 5349 UNION
SELECT 7088, 1255, 5349 UNION
SELECT 7089, 1246, 5349

its hard coded there are 1200 records with multiple values in my table



You should be able to replace ROW_NUMBER() with RANK() in sql server 2000 and get the same functionality.

As far as the initial INSERT into #Engr_BOM_Detail statement I am only doing that to build a representation of your actual table. Replace #Engr_BOM_Detail with your table Engr_BOM_Detail and disregard the first insert statement.
Go to Top of Page
   

- Advertisement -