removing semi duplicate rows. MS-SQL DELETE

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

removing semi duplicate rows. MS-SQL DELETE

Author

Topic

ernest.l.arrow
Starting Member

4 Posts

Posted - 2011-10-24 : 10:40:24

I have wide and long table with semi dups.

A duplicate is defined as if there are two rows in which row1.column3=row2.column3 and
row1.column1=row2.column2 and
row2.column1=row1.column2

Other columns are not important.
I don't care if column1=column2 or not, I have control over that.

How do I delete every other row?
Each row has apx. 1 dup, therefore, I'll remain with about half of the data.

I thought of:
myIndexColumn = Column1 + Column2 + Column3
myReversedIndexColumn = Column2 + Column1 + Column3

CREATE NONCLUSTERED INDEX myIndex1 ON myDB.dbo.myTable (
myIndexColumn ASC
)
CREATE NONCLUSTERED INDEX myIndex2 ON myDB.dbo.myTable (
myReversedIndexColumn ASC
)

DELETE FROM myDB.dbo.myTable
WHERE myIndexColumn in (SELECT myReversedIndex FROM myDB.dbo.myTable)

but for that I need the SQL SERVER to make a GO after each row tested (and deleted).

How do I do that?

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-10-24 : 11:07:12

you just need this


SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY column1,column2,column3 ORDER BY column1) AS Rn,*
FROM Table
)t
WHERE Rn >1

check and ensure this returns duplicate occurances.

then use it in delete


DELETE t
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY column1,column2,column3 ORDER BY column1) AS Rn
FROM Table
)t
WHERE Rn >1

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

ernest.l.arrow
Starting Member

4 Posts

Posted - 2011-10-24 : 13:07:53

Thanks, but (0 row(s) affected).

Here is an example to split by space:

Column1 Column2 Column3 Columns4-8 Columns9-12 myIndex Decision
David Angela 82 15 53 David+Angela+82 Keep(Single)
David Angela 89 15 53 David+Angela+89 Keep(Single)
David Don 85 15 63 Don+David+85 Keep(First)
Don David 85 63 15 Don+David+85 Remove(Second)
David Don 92 15 63 Don+David+92 Keep(First)
Don David 92 63 15 Don+David+92 Remove(Second)
Ernest Ernest 83 47 47 Ernest+Ernest+83 Keep(Single)(Column1=Column2)
Ernest Ernest 90 47 47 Ernest+Ernest+90 Keep(Single)(Column1=Column2)

quote:
Originally posted by visakh16

you just need this

SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY column1,column2,column3 ORDER BY column1) AS Rn,*
FROM Table
)t
WHERE Rn >1

check and ensure this returns duplicate occurances.

then use it in delete

DELETE t
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY column1,column2,column3 ORDER BY column1) AS Rn
FROM Table
)t
WHERE Rn >1

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-10-24 : 13:40:21

when you say first second etc on what basis you identify them? is there a unique valued filed you can use for that?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2011-10-24 : 14:17:17

Using subquery to generate a new column that indicates the duplicated rows, then use that column for partition as visakh16 presents.

If you need more help on this issue, let me know.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources