SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Deleting Duplicate Records
 New Topic  Reply to Topic
 Printer Friendly
Previous Page | Next Page
Author Previous Topic Topic Next Topic
Page: of 3

Drummos
Starting Member

USA
4 Posts

Posted - 02/11/2007 :  14:39:14  Show Profile  Reply with Quote
The article on deleting duplicates worked well for me. But, now every time I try to create the PK a new duplicate record is created. I'm not sure what's going on maybe it's a technical issue in SQL 2005?

Ally
Go to Top of Page

graz
Chief SQLTeam Crack Dealer

USA
4138 Posts

Posted - 02/11/2007 :  23:48:34  Show Profile  Visit graz's Homepage  Reply with Quote
How are you creating the primary key?

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30282 Posts

Posted - 04/16/2007 :  04:49:17  Show Profile  Visit SwePeso's Homepage  Reply with Quote
How do all these methods compare to this I discovered today?
DELETE		t1
FROM		(
			SELECT		ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID
			FROM		Table1
		) AS t1
WHERE		RecID > 1


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

peter3286
Starting Member

1 Posts

Posted - 04/16/2007 :  07:20:17  Show Profile  Reply with Quote
get the list of duplicate delete

SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
Go to Top of Page

jcarnes
Starting Member

USA
5 Posts

Posted - 05/27/2007 :  14:17:23  Show Profile  Reply with Quote
I'm trying to use this command from above to delete dupes:
____________________

DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)
____________________

Unfortunately, I'm so new to SQL that I don't even know what to put in the various fields. My relevant fields are: id, title, introtext, and sectionid.

I'd like this query to check for dupes in "title" and "introtext" and if it finds any AND the "sectionid" is identical, then I'd like to delete the oldest entry and leave the newest. If the "sectionid" won't work with this, it's not a big deal. If the date check won't work, it isn't a big deal either. Trashing any dupes is better than having them and they're usually within a day of each other anyway.

If anyone can give me some guidance on this I'd really appreciate it. Thanks!
Go to Top of Page

rakesh koyadi
Starting Member

India
3 Posts

Posted - 05/30/2007 :  15:51:03  Show Profile  Send rakesh koyadi a Yahoo! Message  Reply with Quote
Hi,
First set the row count to 1 and then delete the duplicate record from the table and then again set the rowcount to 0. see the below example :

create table tab12 (a int)

select * from tab12

insert into tab12 values (1)
insert into tab12 values (1)

--set the rowcount depending on the number of duplicates you want to delete

set rowcount 1
delete from tab12 where a=1
set rowcount 0

select * from tab12

Rakesh


rakesh
Go to Top of Page

jcarnes
Starting Member

USA
5 Posts

Posted - 05/30/2007 :  16:17:29  Show Profile  Reply with Quote
I'm really red-faced here. Turns out I have MySQL and therefore this has wasted your time. I'm sorry. I am so new with SQL that I didn't know there were even different flavors. Thanks for the response!
Go to Top of Page

aruntom
Starting Member

India
1 Posts

Posted - 07/11/2007 :  03:22:18  Show Profile  Reply with Quote
If we use DISTINCT
It will not work if our table has text, ntext, or image data type
we wil get an error as:
The text, ntext, or image data type cannot be selected as DISTINCT.

so add another identity column..
then search for replicating data and delete with that new ID..
It will work definitely..

regard
Arun Thomas

Go to Top of Page

ilg1976
Starting Member

1 Posts

Posted - 07/16/2007 :  23:44:53  Show Profile  Reply with Quote
I am not expert in sql server. When i had a situation where i do not have any values to check, i used the following query to find duplicate values in particular column.

SELECT column FROM table
GROUP BY column
HAVING COUNT(column) > 1

I do not know well about performance. It took 20 seconds to find 1 duplicate column from 2 million records.
Go to Top of Page

khtan
In (Som, Ni, Yak)

Singapore
17681 Posts

Posted - 07/16/2007 :  23:49:16  Show Profile  Reply with Quote
quote:
Originally posted by ilg1976

I am not expert in sql server. When i had a situation where i do not have any values to check, i used the following query to find duplicate values in particular column.

SELECT column FROM table
GROUP BY column
HAVING COUNT(column) > 1

I do not know well about performance. It took 20 seconds to find 1 duplicate column from 2 million records.


if you have index on the column, it will be faster.


KH
Time is always against us

Go to Top of Page

tradergordo
Starting Member

2 Posts

Posted - 03/11/2008 :  08:45:41  Show Profile  Reply with Quote
I've enjoyed this thread, so many ways to do the same task. For the benefit of future readers - I wanted to submit a tweak of what looks like the most popular solution posted in this thread - that query actually has a subselect, which can be bad for performance, it also didn't use ansi joins, and it has redundant criteria in the where clause. This is my improved version:

DELETE a
FROM [Table with Duplicates] a join [Table with Duplicates] b
ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
WHERE
a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey


Yes, its that simple, just a self join on all the columns that should never be dupicated, with a less than (can also use greater than) in the where clause on the primary key.

[Note that if you have nullable columns in your table that you also want to compare, AND you want null to equal null, then you may have to add an extra step or add to your where clause].

I also really liked the primary key building sql code from the original article. If you don't already have a primary key on the table you are de-duping, you can build it with this:

alter table [Table with Duplicates]add PK int NULL
declare @intCounter int
set @intCounter = 0
update [Table with Duplicates]
SET @intCounter = PK = @intCounter + 1


OK so this is a good basic framework, and this is going to work fine in most situations, even on most large tables. But so far NONE of the queries posted will work in extreme situations where you have limited disk space and extremely large tables with a large number of dups. I had to "de-dup" a very wide table with 300 million rows, an unknown number of dups, limited free fixed disk space, and no primary key! This is a worst case scenario. I'll post the query I used to do it in the next post.
Go to Top of Page

tradergordo
Starting Member

2 Posts

Posted - 03/11/2008 :  09:51:13  Show Profile  Reply with Quote
OK - so you have limited disk space, massive tables, and an unknown number of dups. What can you do? The idea is basically the same as my previous post, you just need to find a way to break things up into chunks so that you do not fill up your log (even if you use simple logging).

There are a number of ways you can break things up, you can use a loop with rowcount which results in cleaner and more concise code, but this is not going to work for de-duping, and its often faster to take advantage of whatever indexes you have on your table already (and every table should have at least a clustered index).

/* This version uses just one nesting level, and an increment value of one. To break the data up even more, additional nesting levels may be required (code to follow) and/or you may need to use an increment greater than one (second example shows how to do this). */

alter table [Table with Duplicates] add PK int NULL

declare @i int
declare @Maxi int
declare @intCounter int
set @intCounter = 0
select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */
select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]

while @i<@Maxi
begin
update [Table with Duplicates]
SET @intCounter = PK = @intCounter + 1
where [indexed Column 1] = @i
checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */
--Otherwise you can manually truncate the log at this point (substitute you log file name below):
--dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)
select @i = @i+1
end


/* This same concept as above can be expanded, to chunk things up even more using additional nesting levels. The code might change based on how many values you have in the column you are using to chunk things up, so for example you might increment by 1 or by 100,000 here's an example with two nesting levels: */

alter table [Table with Duplicates] add PK int NULL

declare @i int
declare @Maxi int
declare @j int
declare @Maxj int
declare @jIncrement int
declare @intCounter int
set @intCounter = 0
set @jIncrement = 10000
select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */
select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]
select @j = 0
select @Maxj = max([indexed Column 1])+@jIncrement from [Table with Duplicates]

while @i<@Maxi
begin
while @j<@Maxj
begin
update [Table with Duplicates]
SET @intCounter = PK = @intCounter + 1
where [indexed Column 1] = @i and [indexed Column 2]>@j AND [indexed Column 2]<@j+@jIncrement

checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */
--Otherwise you can manually truncate the log at this point (substitute you log file name below):
--dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)

select @j = @j+@jIncrement
end
select @i = @i+1
set @j=0
end




So the above concepts can be applied to the actual de-duping code as well, and you end up with something like this:
/* This version uses just one nesting level, and an increment value of one. To break the data up even more, additional nesting levels may be required (code to follow) and/or you may need to use an increment greater than one (second example shows how to do this). */

declare @i int
declare @Maxi int
declare @intCounter int
set @intCounter = 0
select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */
select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]

while @i<@Maxi
begin
DELETE a
FROM [Table with Duplicates] a join [Table with Duplicates] b
ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
WHERE
a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
AND [indexed Column 1] = @i

checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */
--Otherwise you can manually truncate the log at this point (substitute you log file name below):
--dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)

select @i = @i+1
end


And again an example with breaking it up even further:
declare @i int
declare @Maxi int
declare @j int
declare @Maxj int
declare @jIncrement int
declare @intCounter int
set @intCounter = 0
set @jIncrement = 10000
select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */
select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]
select @j = 0
select @Maxj = max([indexed Column 1])+@jIncrement from [Table with Duplicates]

while @i<@Maxi
begin
while @j<@Maxj
begin
DELETE a
FROM [Table with Duplicates] a join [Table with Duplicates] b
ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
WHERE
a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
AND [indexed Column 1] = @i
AND [indexed Column 2]>@j AND [indexed Column 2]<@j+@jIncrement

checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */
--Otherwise you can manually truncate the log at this point (substitute you log file name below):
--dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)
select @j = @j+@jIncrement
end
select @i = @i+1
set @j=0
end

Go to Top of Page

shaileshk
Starting Member

India
6 Posts

Posted - 04/11/2008 :  04:46:51  Show Profile  Visit shaileshk's Homepage  Send shaileshk a Yahoo! Message  Reply with Quote
if you want to remove duplicate records with ntext,text and image datatype .please do the step as given in below

http://www.codegroups.com/blog/index.php/deleting-duplicate-records-from-table-with-text-ntext-or-image-data-type/

I hope this is help !

Regards,
Shaileshk


shailesh kavathiya
http://www.codegroup.com/blog
Go to Top of Page

cognecy
Starting Member

1 Posts

Posted - 06/03/2008 :  02:10:23  Show Profile  Reply with Quote
NTEXT datatype are a real stick in my side when it comes to removing duplicate rows in a table. I am trying to remove about 7 duplicate rows with in my SQL2005 table using a query. The last column in the table is of ntext data type. I learned this technique that I show below from a posting in another forum some time ago and it works great as long as the table does not have any ntext data types in it. Please take a look at my query below and tell me what I must do to tweak it and make it work with the one ntext column.


--Created "Temporary" Table to store Duplicate Data
CREATE TABLE #TEMPDUPTABLE (
[idOrder] [int] NULL ,
[cardtype] [nvarchar] (50) NULL ,
[cardnumber] [nvarchar] (100) NULL ,
[expiration] [datetime] NULL ,
[seqcode] [nvarchar] (10) NULL ,
[comments] [ntext] NULL --NTEXT field is the problem
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] --Thought this would work

--Identify Dulpicate Data and save to "Temporary" table
INSERT INTO #TEMPDUPTABLE
SELECT * FROM dbo.creditCards
GROUP BY idOrder,cardtype,cardnumber,expiration,seqcode,comments
HAVING COUNT(*) > 1

--Confirmation of duplicate rows
SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'

--Delete duplicates from the original table
DELETE FROM dbo.creditCards
FROM dbo.creditCards
INNER JOIN #TEMPDUPTABLE
ON dbo.creditCards.idOrder = #TEMPDUPTABLE.idOrder
AND dbo.creditCards.cardtype = #TEMPDUPTABLE.cardtype
AND dbo.creditCards.cardnumber = #TEMPDUPTABLE.cardnumber
AND dbo.creditCards.expiration = #TEMPDUPTABLE.expiration
AND dbo.creditCards.seqcode = #TEMPDUPTABLE.seqcode
AND dbo.creditCards.comments = #TEMPDUPTABLE.comments

--Insert the "Cleaned" data back into original table
INSERT INTO dbo.creditCards
SELECT * FROM #TEMPDUPTABLE

--Check once more for duplicate data.
SELECT * FROM dbo.creditCards
GROUP BY idOrder,cardtype,cardnumber,expiration,seqcode,comments
HAVING COUNT(*) > 1

--QA Check the table
SELECT * FROM dbo.creditCards

--Drop the "Temporary" table
DROP TABLE #TEMPDUPTABLE
GO
Go to Top of Page

zhshqzyc
Posting Yak Master

240 Posts

Posted - 09/08/2008 :  12:08:03  Show Profile  Reply with Quote
I have a table variable @x
that only has one column.
How to remove the duplicate records for this specific case?

Thanks.

Select * From @x

19/20" TV
25/27" TV
FRONT PROJECTION TV
ANALOG PROJECTION TV
BLACK & WHITE TV
TV DISPLAY PRODUCT
13" TV
31"+ ANALOG TUBE TV
COMBO TV
UP TO 20 LCD TV
LESS 29 DIG TUBE TV
21-29 LCD TV
...
Go to Top of Page

manus
Starting Member

India
1 Posts

Posted - 08/27/2010 :  07:10:00  Show Profile  Reply with Quote
DELETE DUPLICATE ROWS WHERE IS NO INDEXING COLUMN IN TABLE WITHOUT USING TEMP OR EXTRA TABLES

STEPS:
Consider a table "test" with Column1 and column2
column1 column2
1 h
2 c
1 h
3 g
2 c
8 y

1.Find the total count of rows in the table and store it in a varable say @count

declare @count int
select @count =COUNT(*) from test
select @count ---here it is 6

2 Find the columns which are repeating and insert those into "test"table itself

insert into test select column1 ,column2
FROM test
GROUP BY column1,column2
HAVING ( COUNT(column1) > 1 and COUNT(column2)> 1)
3 Find the columns which are only once and insert those into "test" table itself

insert into test select column1 ,column2
FROM test
GROUP BY column1,column2
HAVING ( COUNT(column1) =1 and COUNT(column2)=1)
The table content now will be
column1 column2
1 h
2 c
1 h
3 g
2 c
8 y
2 c
1 h
3 g
8 y

4 Now delete Top @count number of rows and u will get the final result

DELETE TOP (@count) FROM test

column1 column2
2 c
1 h
3 g
8 y


Manu
Go to Top of Page

virenderverma
Starting Member

India
1 Posts

Posted - 02/21/2011 :  05:20:49  Show Profile  Reply with Quote
CREATE TABLE Users
(
FirstName nvarchar(50),
LastName nvarchar(50)
)
GO

INSERT INTO Users (FirstName, LastName) VALUES (N'Eralper',N'Yilmaz')
INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley')
INSERT INTO Users (FirstName, LastName) VALUES (N'Red',N'Kit')
INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda')
INSERT INTO Users (FirstName, LastName) VALUES (N'Red',N'Kit')
INSERT INTO Users (FirstName, LastName) VALUES (N'James',N'White')
INSERT INTO Users (FirstName, LastName) VALUES (N'Alan',N'Black')
INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley')
INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda')
INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley')
INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda')
INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley')


while ( (SELECT top 1 count(*) FROM Users GROUP BY FirstName, LastName HAVING Count(*) > 1)!=1 )
begin

set rowcount 1
delete Users
where FirstName in
(SELECT FirstName FROM Users GROUP BY FirstName, LastName HAVING Count(*) > 1)

end
Go to Top of Page

expert
Starting Member

4 Posts

Posted - 09/19/2011 :  11:33:20  Show Profile  Reply with Quote
This link describes different methods of deleting duplicate records
http://www.besttechtools.com/SQLArticles.aspx?ID=DeleteDuplicate
Go to Top of Page

maeenul
Starting Member

20 Posts

Posted - 09/22/2011 :  06:36:12  Show Profile  Reply with Quote
For future reference, I am just giving one good solution. Probably the easiest. This uses CTE. Although the url says sqlserver 2008 but it will also work for sqlserver 2005.

http://blog.programmingsolution.net/sql-server-2008/tsql/easiest-way-to-find-and-delete-duplicate-rows-from-a-table-using-cte-in-sql-server/


-----------------------
maeenul
http://www.programmingsolution.net/sqlserver2005/sqlserver-solutions/
http://sqlservertipsntricks.blogspot.com
Go to Top of Page

maeenul
Starting Member

20 Posts

Posted - 09/22/2011 :  06:37:49  Show Profile  Reply with Quote
For future reference, I am just giving one good solution. Probably the easiest. This uses CTE. Although the url says sqlserver 2008 but it will also work for sqlserver 2005.

http://blog.programmingsolution.net/sql-server-2008/tsql/easiest-way-to-find-and-delete-duplicate-rows-from-a-table-using-cte-in-sql-server/

-----------------------
maeenul
http://www.programmingsolution.net/sqlserver2005/sqlserver-solutions/
http://sqlservertipsntricks.blogspot.com
Go to Top of Page
Page: of 3 Previous Topic Topic Next Topic  
Previous Page | Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.14 seconds. Powered By: Snitz Forums 2000