Deleting Duplicates - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Deleting Duplicates

Author

Topic

dranedav
Starting Member

5 Posts

Posted - 2014-01-06 : 13:43:23

I need to remove duplicate data from around 25 tables. I want to use a while loop to go through all tables. If I list out all of the column names the query runs fine, but since there are 25 tables some with 50 plus columns I was hoping to use something like the following, which errors out because my sub queries return more than one result.

SELECT q.* from
(Select ROW_NUMBER() OVER ( Partition BY (SELECT [name] AS [Name] FROM syscolumns
WHERE id = (SELECT id FROM sysobjects
WHERE type = 'U'
AND [Name] = 'Orders')
)
Order by (select top 1 [name] AS [Name] FROM syscolumns
WHERE id = (SELECT id FROM sysobjects
WHERE type = 'U'
AND [Name] = 'Orders')
and name = 'UniqueId')) R,
(SELECT [name] AS [Name] FROM syscolumns
WHERE id = (SELECT id FROM sysobjects
WHERE type = 'U'
AND [Name] = 'Orders')) AA
From Orders
)q
where R > 1

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-01-06 : 16:39:57

I have no idea what your query has to do with removing duplicates and while loops but it looks like that nasty statement with all the sub-queries can be replaced by one of these:


select name
from   sysColumns
where  id = object_id('Orders')
and    name != 'UniqueId'
order by colid

--information_schema views are preferred over direct queries on the system tables
select column_name 
from information_schema.columns 
where table_name = 'Orders' 
and column_name != 'UniqueId'
order by ordinal_position

Be One with the Optimizer
TG

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2014-01-07 : 01:02:22

for removing duplicates first define which columns you need to consider to identify a duplicate combination. Then give us some sample data to work with and show your desired output out of them.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs

dranedav
Starting Member

5 Posts

Posted - 2014-01-23 : 08:58:23

Sorry for not getting back to this sooner, but my test server had hard drive failure. I am still in the process of getting a new drive so I can get Windows installed. Once I get the system back up I will update this post again. As far as the script that I posted it is just a small part of the entire query I intend to use. I want to get the select statement right before changing it to a delete statement and adding it to my while loop. I am using row_number to compare the data because the person that originally wrote a query to remove the duplicates is using multiple cursors and adding a new column to all the tables then removing the column when the query is done running. I thought it might be better to use a while loop and to use row_number instead of adding a new column.

dranedav
Starting Member

5 Posts

Posted - 2014-01-23 : 10:12:18

Here is the entire query that I am currently using.

SET NOCOUNT ON

declare @id int
Declare @Table varchar (100)
Declare @Column varchar (100)
Declare @Tablecount int
Declare @Columncount int
Declare @id2 int
DECLARE @SQLcmd VARCHAR(max)
DECLARE @indexName VARCHAR(100)
DECLARE @uniqueColumn VARCHAR(100)

Set @id=1
Set @id2=1
SET @uniqueColumn = 'Hash'

IF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableList
CREATE TABLE #tableList (Id int IDENTITY (1, 1) NOT NULL,
Table_name VARCHAR(100))
INSERT INTO #tableList (Table_name)
SELECT name FROM sysobjects WHERE xtype='U' and name like 'AMH%' and name not like '%Rollup%'

set @Tablecount = (select count(*) FROM sysobjects WHERE xtype='U' and name like 'AMH%')

--This section will go through all the tables that can have duplicates in them and remove the duplicates.
while @id < @Tablecount
begin
Set @Table = (SELECT Table_name
from #tableList
where Id = @id )

IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list
CREATE TABLE #Column_list (Id int IDENTITY (1, 1) NOT NULL,
Column_name VARCHAR(100))
INSERT INTO #Column_list (Column_name)
SELECT [name] AS [Name]
FROM syscolumns
WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table)

set @Columncount = (SELECT count(*) FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table))
--This section will show you what table we are currently looking at and will then delete the duplicates from the table.
While @id2 < @Columncount
begin
Set @Column = (SELECT Column_name
from #Column_list
where Id = @id2)
Print 'The Table we are looking at is ' +@Table+ ' the column is ' +@Column+ ' ***'
Set @SQLcmd = 'delete q
(Select ROW_NUMBER() OVER ( PARTITION BY ' + @Column + ' order by ' + @Column +') R
,' + @Column + '
From ' + @Table +')q
where R > 1'
Print @SQLcmd
set @id2 = @id2 + 1
End

-- This section creates a unique, non-clustered index with ignore_dup_key
-- (This will stop duplicate records without halting execution.)
IF @createIndex = 1 BEGIN
SET @indexName = 'IDX_' + @Table + '_Dup-Ignore'
SET @sqlCmd = 'CREATE UNIQUE NONCLUSTERED INDEX ' + @indexName + ' ON ' + @Table + ' (' + @uniqueColumn + ' ) WITH IGNORE_DUP_KEY'
PRINT 'Creating Unique, Nonclustered Index'
BEGIN TRY
EXEC (@sqlCmd)
END TRY
BEGIN CATCH
PRINT '*** Error Creating Unique, Nonclustered Index ***'
PRINT ERROR_MESSAGE()
GOTO THEEND
END CATCH
PRINT 'Successfully Created Unique, Nonclustered Index'
END

set @id = @id + 1
end

THEEND
IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list
PRINT 'Duplicate Removal Script has Completed.'

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2014-01-23 : 13:03:26

I repeat
Give us some sample data to understand your scenario. No use posting queries without having any info on your tables, data

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs

dranedav
Starting Member

5 Posts

Posted - 2014-01-23 : 14:02:57

Hash UniquId Facility Type Location MedId MedName Inventory CostPer TotalCost Facility AgeColumn SiteName
1508917155|1162179050 NULL 1 Automated Filling BLIST 1653 GABAPENTIN 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test
1508917155|1162179050 NULL 1 Automated Filling BLIST 1655 GABAPENTIN 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test
1389886263|1682155837 NULL 1 Cabinet 43 388 DEXTROSE 25 0 0 PRIMARY DELIVERY SITE 00:00.0 Test
1389886263|1682155837 NULL 1 Cabinet 53 388 DEXTROSE 25 0 0 PRIMARY DELIVERY SITE 00:00.0 Test
-777073692|-351477799 NULL 1 OTHER NORO 1505 CHLORPROMAZINE 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test
-777073692|-351477799 NULL 1 OTHER NORO 1506 CHLORPROMAZINE 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2014-01-24 : 01:12:39

[code]
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY Hash,UniqID ORDER BY MedId) AS Seq,*
FROM Table
)t
WHERE Seq=1
[/code]
I'm assuming column names as its not quite clear from your posted code which values belong to which column

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs

sqlsaga
Yak Posting Veteran

93 Posts

Posted - 2014-01-24 : 12:22:59

http://sqlsaga.com/sql-server/how-to-remove-duplicates-from-a-table-in-sql-server/

I have written a step by step article on how to remove duplicates from a table. Please refer, so that you can use it to your requirement.

Visit www.sqlsaga.com for more t-sql snippets and BI related how to's.

dranedav
Starting Member

5 Posts

Posted - 2014-01-24 : 15:38:19

Thank you everyone for your help. I was able to get this working correctly. Below is the script that I am using.

SET NOCOUNT ON

declare @id int
Declare @Table varchar (100)
Declare @Column varchar (100)
Declare @Tablecount int
Declare @Columncount int
Declare @id2 int
DECLARE @SQLcmd VARCHAR(max)
DECLARE @indexName VARCHAR(100)
DECLARE @uniqueColumn VARCHAR(100)
DECLARE @createIndex INTEGER

Set @id=1
Set @id2=1

IF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableList
CREATE TABLE #tableList (Id int IDENTITY (1, 1) NOT NULL,
Table_name VARCHAR(100))
INSERT INTO #tableList (Table_name)
select name from sysobjects
where name not like '%Rollup%'
and xtype='U'
and type = 'U'
and name like 'AHM%'
and id in (select distinct (id) from syscolumns where name = 'Hash')

set @Tablecount = (select count(*) FROM sysobjects WHERE xtype='U')

while @id < @Tablecount
begin
Set @Table = (SELECT Table_name
from #tableList
where Id = @id )

--Adds columns to a temp table
IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list
CREATE TABLE #Column_list (Id int IDENTITY (1, 1) NOT NULL,
Column_name VARCHAR(100))
INSERT INTO #Column_list (Column_name)
SELECT top 1 [name] AS [Name]
FROM syscolumns
WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table)

set @Columncount = (SELECT count(*) FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table))

begin
Set @Column = (SELECT Column_name
from #Column_list
where Id = @id2
and Column_name = 'Hash')
Print '--The Table we are looking at is ' +@Table+ ' ***'
--Adds any missing Hash values
Begin
SET @SQLcmd =
' UPDATE ' + @Table + ' SET
Hash = LTRIM(RTRIM(CONVERT(VARCHAR, CHECKSUM(*)))) + ''|'' + LTRIM(RTRIM(CONVERT(VARCHAR, BINARY_CHECKSUM(*))))
WHERE Hash is NULL'
PRINT 'Updating Missing Hash Checksums in ' + @Table
EXEC (@SQLcmd)
End
Begin
Print '--The Table we are deleting from is ' +@Table+ ' ***'
Set @SQLcmd = 'Delete q from
(Select ROW_NUMBER() OVER ( PARTITION BY ' + @Column + ' order by ' + @Column +')R
,'+ @Column + '
From ' + @Table +' )q
where R > 1'
EXEC (@SQLcmd)
End

End
-- This section creates a unique, non-clustered index with ignore_dup_key
-- (This will stop duplicate records without halting execution.)
IF @createIndex = 1
BEGIN
SET @indexName = 'IDX_' + @Table + '_Dup-Ignore'
SET @SQLcmd = 'CREATE UNIQUE NONCLUSTERED INDEX ' + @indexName + ' ON ' + @Table + ' (' + @uniqueColumn + ' ) WITH IGNORE_DUP_KEY'
PRINT 'Creating Unique, Nonclustered Index'
BEGIN TRY
EXEC (@SQLcmd)
END TRY
BEGIN CATCH
PRINT '*** Error Creating Unique, Nonclustered Index ***'
PRINT ERROR_MESSAGE()
END CATCH
PRINT 'Successfully Created Unique, Nonclustered Index'
END
set @id = @id + 1
end

IF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableList
IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list
PRINT 'Duplicate Removal Script has Completed.'

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources