| Author |
Topic  |
|
Drummos
Starting Member
USA
4 Posts |
Posted - 02/11/2007 : 14:39:14
|
The article on deleting duplicates worked well for me. But, now every time I try to create the PK a new duplicate record is created. I'm not sure what's going on maybe it's a technical issue in SQL 2005?
Ally |
 |
|
|
graz
Chief SQLTeam Crack Dealer
USA
4128 Posts |
Posted - 02/11/2007 : 23:48:34
|
How are you creating the primary key?
=============================================== Creating tomorrow's legacy systems today. One crisis at a time. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/16/2007 : 04:49:17
|
How do all these methods compare to this I discovered today?DELETE t1
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID
FROM Table1
) AS t1
WHERE RecID > 1
Peter Larsson Helsingborg, Sweden |
 |
|
|
peter3286
Starting Member
1 Posts |
Posted - 04/16/2007 : 07:20:17
|
get the list of duplicate delete
SELECT col1, col2, count(*) FROM t1 GROUP BY col1, col2 HAVING count(*) > 1 |
 |
|
|
jcarnes
Starting Member
USA
5 Posts |
Posted - 05/27/2007 : 14:17:23
|
I'm trying to use this command from above to delete dupes: ____________________
DELETE FROM [Table with Duplicates] WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b
WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey ) ____________________
Unfortunately, I'm so new to SQL that I don't even know what to put in the various fields. My relevant fields are: id, title, introtext, and sectionid.
I'd like this query to check for dupes in "title" and "introtext" and if it finds any AND the "sectionid" is identical, then I'd like to delete the oldest entry and leave the newest. If the "sectionid" won't work with this, it's not a big deal. If the date check won't work, it isn't a big deal either. Trashing any dupes is better than having them and they're usually within a day of each other anyway.
If anyone can give me some guidance on this I'd really appreciate it. Thanks! |
 |
|
|
rakesh koyadi
Starting Member
India
3 Posts |
Posted - 05/30/2007 : 15:51:03
|
Hi, First set the row count to 1 and then delete the duplicate record from the table and then again set the rowcount to 0. see the below example :
create table tab12 (a int)
select * from tab12
insert into tab12 values (1) insert into tab12 values (1)
--set the rowcount depending on the number of duplicates you want to delete
set rowcount 1 delete from tab12 where a=1 set rowcount 0
select * from tab12
Rakesh
rakesh |
 |
|
|
jcarnes
Starting Member
USA
5 Posts |
Posted - 05/30/2007 : 16:17:29
|
| I'm really red-faced here. Turns out I have MySQL and therefore this has wasted your time. I'm sorry. I am so new with SQL that I didn't know there were even different flavors. Thanks for the response! |
 |
|
|
aruntom
Starting Member
India
1 Posts |
Posted - 07/11/2007 : 03:22:18
|
If we use DISTINCT It will not work if our table has text, ntext, or image data type we wil get an error as: The text, ntext, or image data type cannot be selected as DISTINCT.
so add another identity column.. then search for replicating data and delete with that new ID.. It will work definitely..
regard Arun Thomas
|
 |
|
|
ilg1976
Starting Member
1 Posts |
Posted - 07/16/2007 : 23:44:53
|
I am not expert in sql server. When i had a situation where i do not have any values to check, i used the following query to find duplicate values in particular column.
SELECT column FROM table GROUP BY column HAVING COUNT(column) > 1
I do not know well about performance. It took 20 seconds to find 1 duplicate column from 2 million records. |
 |
|
|
khtan
In (Som, Ni, Yak)
Singapore
16746 Posts |
Posted - 07/16/2007 : 23:49:16
|
quote: Originally posted by ilg1976
I am not expert in sql server. When i had a situation where i do not have any values to check, i used the following query to find duplicate values in particular column.
SELECT column FROM table GROUP BY column HAVING COUNT(column) > 1
I do not know well about performance. It took 20 seconds to find 1 duplicate column from 2 million records.
if you have index on the column, it will be faster.
KH Time is always against us
|
 |
|
|
tradergordo
Starting Member
2 Posts |
Posted - 03/11/2008 : 08:45:41
|
I've enjoyed this thread, so many ways to do the same task. For the benefit of future readers - I wanted to submit a tweak of what looks like the most popular solution posted in this thread - that query actually has a subselect, which can be bad for performance, it also didn't use ansi joins, and it has redundant criteria in the where clause. This is my improved version:
DELETE a FROM [Table with Duplicates] a join [Table with Duplicates] b ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname WHERE a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
Yes, its that simple, just a self join on all the columns that should never be dupicated, with a less than (can also use greater than) in the where clause on the primary key.
[Note that if you have nullable columns in your table that you also want to compare, AND you want null to equal null, then you may have to add an extra step or add to your where clause].
I also really liked the primary key building sql code from the original article. If you don't already have a primary key on the table you are de-duping, you can build it with this:
alter table [Table with Duplicates]add PK int NULL declare @intCounter int set @intCounter = 0 update [Table with Duplicates] SET @intCounter = PK = @intCounter + 1
OK so this is a good basic framework, and this is going to work fine in most situations, even on most large tables. But so far NONE of the queries posted will work in extreme situations where you have limited disk space and extremely large tables with a large number of dups. I had to "de-dup" a very wide table with 300 million rows, an unknown number of dups, limited free fixed disk space, and no primary key! This is a worst case scenario. I'll post the query I used to do it in the next post.
|
 |
|
|
tradergordo
Starting Member
2 Posts |
Posted - 03/11/2008 : 09:51:13
|
OK - so you have limited disk space, massive tables, and an unknown number of dups. What can you do? The idea is basically the same as my previous post, you just need to find a way to break things up into chunks so that you do not fill up your log (even if you use simple logging).
There are a number of ways you can break things up, you can use a loop with rowcount which results in cleaner and more concise code, but this is not going to work for de-duping, and its often faster to take advantage of whatever indexes you have on your table already (and every table should have at least a clustered index).
/* This version uses just one nesting level, and an increment value of one. To break the data up even more, additional nesting levels may be required (code to follow) and/or you may need to use an increment greater than one (second example shows how to do this). */
alter table [Table with Duplicates] add PK int NULL
declare @i int declare @Maxi int declare @intCounter int set @intCounter = 0 select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */ select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]
while @i<@Maxi begin update [Table with Duplicates] SET @intCounter = PK = @intCounter + 1 where [indexed Column 1] = @i checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */ --Otherwise you can manually truncate the log at this point (substitute you log file name below): --dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly) select @i = @i+1 end
/* This same concept as above can be expanded, to chunk things up even more using additional nesting levels. The code might change based on how many values you have in the column you are using to chunk things up, so for example you might increment by 1 or by 100,000 here's an example with two nesting levels: */
alter table [Table with Duplicates] add PK int NULL
declare @i int declare @Maxi int declare @j int declare @Maxj int declare @jIncrement int declare @intCounter int set @intCounter = 0 set @jIncrement = 10000 select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */ select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates] select @j = 0 select @Maxj = max([indexed Column 1])+@jIncrement from [Table with Duplicates]
while @i<@Maxi begin while @j<@Maxj begin update [Table with Duplicates] SET @intCounter = PK = @intCounter + 1 where [indexed Column 1] = @i and [indexed Column 2]>@j AND [indexed Column 2]<@j+@jIncrement
checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */ --Otherwise you can manually truncate the log at this point (substitute you log file name below): --dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)
select @j = @j+@jIncrement end select @i = @i+1 set @j=0 end
So the above concepts can be applied to the actual de-duping code as well, and you end up with something like this: /* This version uses just one nesting level, and an increment value of one. To break the data up even more, additional nesting levels may be required (code to follow) and/or you may need to use an increment greater than one (second example shows how to do this). */
declare @i int declare @Maxi int declare @intCounter int set @intCounter = 0 select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */ select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates]
while @i<@Maxi begin DELETE a FROM [Table with Duplicates] a join [Table with Duplicates] b ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname WHERE a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey AND [indexed Column 1] = @i checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */ --Otherwise you can manually truncate the log at this point (substitute you log file name below): --dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly)
select @i = @i+1 end
And again an example with breaking it up even further: declare @i int declare @Maxi int declare @j int declare @Maxj int declare @jIncrement int declare @intCounter int set @intCounter = 0 set @jIncrement = 10000 select @i = min([indexed Column 1]) from [Table with Duplicates] /* Example: A time ID column */ select @Maxi = max([indexed Column 1])+1 from [Table with Duplicates] select @j = 0 select @Maxj = max([indexed Column 1])+@jIncrement from [Table with Duplicates]
while @i<@Maxi begin while @j<@Maxj begin DELETE a FROM [Table with Duplicates] a join [Table with Duplicates] b ON a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname WHERE a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey AND [indexed Column 1] = @i AND [indexed Column 2]>@j AND [indexed Column 2]<@j+@jIncrement checkpoint /* If you are using truncate on checkpoint (simple logging) this will ensure the log space is free */ --Otherwise you can manually truncate the log at this point (substitute you log file name below): --dbcc shrinkfile ([CurrentDB_log], 1000, truncateonly) select @j = @j+@jIncrement end select @i = @i+1 set @j=0 end
|
 |
|
|
shaileshk
Starting Member
India
6 Posts |
|
|
cognecy
Starting Member
1 Posts |
Posted - 06/03/2008 : 02:10:23
|
NTEXT datatype are a real stick in my side when it comes to removing duplicate rows in a table. I am trying to remove about 7 duplicate rows with in my SQL2005 table using a query. The last column in the table is of ntext data type. I learned this technique that I show below from a posting in another forum some time ago and it works great as long as the table does not have any ntext data types in it. Please take a look at my query below and tell me what I must do to tweak it and make it work with the one ntext column.
--Created "Temporary" Table to store Duplicate Data CREATE TABLE #TEMPDUPTABLE ( [idOrder] [int] NULL , [cardtype] [nvarchar] (50) NULL , [cardnumber] [nvarchar] (100) NULL , [expiration] [datetime] NULL , [seqcode] [nvarchar] (10) NULL , [comments] [ntext] NULL --NTEXT field is the problem ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] --Thought this would work
--Identify Dulpicate Data and save to "Temporary" table INSERT INTO #TEMPDUPTABLE SELECT * FROM dbo.creditCards GROUP BY idOrder,cardtype,cardnumber,expiration,seqcode,comments HAVING COUNT(*) > 1
--Confirmation of duplicate rows SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'
--Delete duplicates from the original table DELETE FROM dbo.creditCards FROM dbo.creditCards INNER JOIN #TEMPDUPTABLE ON dbo.creditCards.idOrder = #TEMPDUPTABLE.idOrder AND dbo.creditCards.cardtype = #TEMPDUPTABLE.cardtype AND dbo.creditCards.cardnumber = #TEMPDUPTABLE.cardnumber AND dbo.creditCards.expiration = #TEMPDUPTABLE.expiration AND dbo.creditCards.seqcode = #TEMPDUPTABLE.seqcode AND dbo.creditCards.comments = #TEMPDUPTABLE.comments
--Insert the "Cleaned" data back into original table INSERT INTO dbo.creditCards SELECT * FROM #TEMPDUPTABLE
--Check once more for duplicate data. SELECT * FROM dbo.creditCards GROUP BY idOrder,cardtype,cardnumber,expiration,seqcode,comments HAVING COUNT(*) > 1
--QA Check the table SELECT * FROM dbo.creditCards
--Drop the "Temporary" table DROP TABLE #TEMPDUPTABLE GO
|
 |
|
|
zhshqzyc
Posting Yak Master
240 Posts |
Posted - 09/08/2008 : 12:08:03
|
I have a table variable @x that only has one column. How to remove the duplicate records for this specific case?
Thanks.
Select * From @x
19/20" TV
25/27" TV
FRONT PROJECTION TV
ANALOG PROJECTION TV
BLACK & WHITE TV
TV DISPLAY PRODUCT
13" TV
31"+ ANALOG TUBE TV
COMBO TV
UP TO 20 LCD TV
LESS 29 DIG TUBE TV
21-29 LCD TV
... |
 |
|
|
manus
Starting Member
India
1 Posts |
Posted - 08/27/2010 : 07:10:00
|
DELETE DUPLICATE ROWS WHERE IS NO INDEXING COLUMN IN TABLE WITHOUT USING TEMP OR EXTRA TABLES
STEPS: Consider a table "test" with Column1 and column2 column1 column2 1 h 2 c 1 h 3 g 2 c 8 y
1.Find the total count of rows in the table and store it in a varable say @count
declare @count int select @count =COUNT(*) from test select @count ---here it is 6
2 Find the columns which are repeating and insert those into "test"table itself
insert into test select column1 ,column2 FROM test GROUP BY column1,column2 HAVING ( COUNT(column1) > 1 and COUNT(column2)> 1) 3 Find the columns which are only once and insert those into "test" table itself
insert into test select column1 ,column2 FROM test GROUP BY column1,column2 HAVING ( COUNT(column1) =1 and COUNT(column2)=1) The table content now will be column1 column2 1 h 2 c 1 h 3 g 2 c 8 y 2 c 1 h 3 g 8 y 4 Now delete Top @count number of rows and u will get the final result
DELETE TOP (@count) FROM test
column1 column2 2 c 1 h 3 g 8 y
Manu |
 |
|
|
virenderverma
Starting Member
India
1 Posts |
Posted - 02/21/2011 : 05:20:49
|
CREATE TABLE Users ( FirstName nvarchar(50), LastName nvarchar(50) ) GO
INSERT INTO Users (FirstName, LastName) VALUES (N'Eralper',N'Yilmaz') INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley') INSERT INTO Users (FirstName, LastName) VALUES (N'Red',N'Kit') INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda') INSERT INTO Users (FirstName, LastName) VALUES (N'Red',N'Kit') INSERT INTO Users (FirstName, LastName) VALUES (N'James',N'White') INSERT INTO Users (FirstName, LastName) VALUES (N'Alan',N'Black') INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley') INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda') INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley') INSERT INTO Users (FirstName, LastName) VALUES (N'Jane',N'Fonda') INSERT INTO Users (FirstName, LastName) VALUES (N'Elvis',N'Presley')
while ( (SELECT top 1 count(*) FROM Users GROUP BY FirstName, LastName HAVING Count(*) > 1)!=1 ) begin
set rowcount 1 delete Users where FirstName in (SELECT FirstName FROM Users GROUP BY FirstName, LastName HAVING Count(*) > 1)
end
|
 |
|
|
expert
Starting Member
4 Posts |
|
|
maeenul
Starting Member
20 Posts |
|
|
maeenul
Starting Member
20 Posts |
|
Topic  |
|