Author |
Topic |
dranedav
Starting Member
5 Posts |
Posted - 2014-01-06 : 13:43:23
|
I need to remove duplicate data from around 25 tables. I want to use a while loop to go through all tables. If I list out all of the column names the query runs fine, but since there are 25 tables some with 50 plus columns I was hoping to use something like the following, which errors out because my sub queries return more than one result. SELECT q.* from (Select ROW_NUMBER() OVER ( Partition BY (SELECT [name] AS [Name] FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = 'Orders') ) Order by (select top 1 [name] AS [Name] FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = 'Orders') and name = 'UniqueId')) R, (SELECT [name] AS [Name] FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = 'Orders')) AA From Orders )qwhere R > 1 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-01-06 : 16:39:57
|
I have no idea what your query has to do with removing duplicates and while loops but it looks like that nasty statement with all the sub-queries can be replaced by one of these:select namefrom sysColumnswhere id = object_id('Orders')and name != 'UniqueId'order by colid--information_schema views are preferred over direct queries on the system tablesselect column_name from information_schema.columns where table_name = 'Orders' and column_name != 'UniqueId'order by ordinal_position Be One with the OptimizerTG |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2014-01-07 : 01:02:22
|
for removing duplicates first define which columns you need to consider to identify a duplicate combination. Then give us some sample data to work with and show your desired output out of them.------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
dranedav
Starting Member
5 Posts |
Posted - 2014-01-23 : 08:58:23
|
Sorry for not getting back to this sooner, but my test server had hard drive failure. I am still in the process of getting a new drive so I can get Windows installed. Once I get the system back up I will update this post again. As far as the script that I posted it is just a small part of the entire query I intend to use. I want to get the select statement right before changing it to a delete statement and adding it to my while loop. I am using row_number to compare the data because the person that originally wrote a query to remove the duplicates is using multiple cursors and adding a new column to all the tables then removing the column when the query is done running. I thought it might be better to use a while loop and to use row_number instead of adding a new column. |
|
|
dranedav
Starting Member
5 Posts |
Posted - 2014-01-23 : 10:12:18
|
Here is the entire query that I am currently using.SET NOCOUNT ONdeclare @id intDeclare @Table varchar (100)Declare @Column varchar (100)Declare @Tablecount intDeclare @Columncount intDeclare @id2 intDECLARE @SQLcmd VARCHAR(max)DECLARE @indexName VARCHAR(100)DECLARE @uniqueColumn VARCHAR(100)Set @id=1Set @id2=1SET @uniqueColumn = 'Hash'IF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableListCREATE TABLE #tableList (Id int IDENTITY (1, 1) NOT NULL, Table_name VARCHAR(100)) INSERT INTO #tableList (Table_name) SELECT name FROM sysobjects WHERE xtype='U' and name like 'AMH%' and name not like '%Rollup%'set @Tablecount = (select count(*) FROM sysobjects WHERE xtype='U' and name like 'AMH%')--This section will go through all the tables that can have duplicates in them and remove the duplicates.while @id < @Tablecountbegin Set @Table = (SELECT Table_name from #tableList where Id = @id ) IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list CREATE TABLE #Column_list (Id int IDENTITY (1, 1) NOT NULL, Column_name VARCHAR(100)) INSERT INTO #Column_list (Column_name) SELECT [name] AS [Name] FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table) set @Columncount = (SELECT count(*) FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table)) --This section will show you what table we are currently looking at and will then delete the duplicates from the table. While @id2 < @Columncount begin Set @Column = (SELECT Column_name from #Column_list where Id = @id2) Print 'The Table we are looking at is ' +@Table+ ' the column is ' +@Column+ ' ***' Set @SQLcmd = 'delete q (Select ROW_NUMBER() OVER ( PARTITION BY ' + @Column + ' order by ' + @Column +') R ,' + @Column + ' From ' + @Table +')q where R > 1' Print @SQLcmd set @id2 = @id2 + 1 End -- This section creates a unique, non-clustered index with ignore_dup_key -- (This will stop duplicate records without halting execution.) IF @createIndex = 1 BEGIN SET @indexName = 'IDX_' + @Table + '_Dup-Ignore' SET @sqlCmd = 'CREATE UNIQUE NONCLUSTERED INDEX ' + @indexName + ' ON ' + @Table + ' (' + @uniqueColumn + ' ) WITH IGNORE_DUP_KEY' PRINT 'Creating Unique, Nonclustered Index' BEGIN TRY EXEC (@sqlCmd) END TRY BEGIN CATCH PRINT '*** Error Creating Unique, Nonclustered Index ***' PRINT ERROR_MESSAGE() GOTO THEEND END CATCH PRINT 'Successfully Created Unique, Nonclustered Index' END set @id = @id + 1endTHEENDIF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_listPRINT 'Duplicate Removal Script has Completed.' |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2014-01-23 : 13:03:26
|
I repeatGive us some sample data to understand your scenario. No use posting queries without having any info on your tables, data------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
dranedav
Starting Member
5 Posts |
Posted - 2014-01-23 : 14:02:57
|
Hash UniquId Facility Type Location MedId MedName Inventory CostPer TotalCost Facility AgeColumn SiteName1508917155|1162179050 NULL 1 Automated Filling BLIST 1653 GABAPENTIN 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test1508917155|1162179050 NULL 1 Automated Filling BLIST 1655 GABAPENTIN 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test1389886263|1682155837 NULL 1 Cabinet 43 388 DEXTROSE 25 0 0 PRIMARY DELIVERY SITE 00:00.0 Test1389886263|1682155837 NULL 1 Cabinet 53 388 DEXTROSE 25 0 0 PRIMARY DELIVERY SITE 00:00.0 Test-777073692|-351477799 NULL 1 OTHER NORO 1505 CHLORPROMAZINE 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test-777073692|-351477799 NULL 1 OTHER NORO 1506 CHLORPROMAZINE 0 0 0 PRIMARY DELIVERY SITE 00:00.0 Test |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2014-01-24 : 01:12:39
|
[code]SELECT *FROM(SELECT ROW_NUMBER() OVER (PARTITION BY Hash,UniqID ORDER BY MedId) AS Seq,*FROM Table)tWHERE Seq=1[/code]I'm assuming column names as its not quite clear from your posted code which values belong to which column------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
sqlsaga
Yak Posting Veteran
93 Posts |
Posted - 2014-01-24 : 12:22:59
|
http://sqlsaga.com/sql-server/how-to-remove-duplicates-from-a-table-in-sql-server/I have written a step by step article on how to remove duplicates from a table. Please refer, so that you can use it to your requirement.Visit www.sqlsaga.com for more t-sql snippets and BI related how to's. |
|
|
dranedav
Starting Member
5 Posts |
Posted - 2014-01-24 : 15:38:19
|
Thank you everyone for your help. I was able to get this working correctly. Below is the script that I am using.SET NOCOUNT ONdeclare @id intDeclare @Table varchar (100)Declare @Column varchar (100)Declare @Tablecount intDeclare @Columncount intDeclare @id2 intDECLARE @SQLcmd VARCHAR(max)DECLARE @indexName VARCHAR(100)DECLARE @uniqueColumn VARCHAR(100)DECLARE @createIndex INTEGERSet @id=1Set @id2=1IF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableListCREATE TABLE #tableList (Id int IDENTITY (1, 1) NOT NULL, Table_name VARCHAR(100)) INSERT INTO #tableList (Table_name) select name from sysobjects where name not like '%Rollup%' and xtype='U' and type = 'U' and name like 'AHM%' and id in (select distinct (id) from syscolumns where name = 'Hash')set @Tablecount = (select count(*) FROM sysobjects WHERE xtype='U')while @id < @Tablecountbegin Set @Table = (SELECT Table_name from #tableList where Id = @id )--Adds columns to a temp table IF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_list CREATE TABLE #Column_list (Id int IDENTITY (1, 1) NOT NULL, Column_name VARCHAR(100)) INSERT INTO #Column_list (Column_name) SELECT top 1 [name] AS [Name] FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table) set @Columncount = (SELECT count(*) FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE type = 'U' AND [Name] = @Table)) begin Set @Column = (SELECT Column_name from #Column_list where Id = @id2 and Column_name = 'Hash') Print '--The Table we are looking at is ' +@Table+ ' ***' --Adds any missing Hash values Begin SET @SQLcmd = ' UPDATE ' + @Table + ' SET Hash = LTRIM(RTRIM(CONVERT(VARCHAR, CHECKSUM(*)))) + ''|'' + LTRIM(RTRIM(CONVERT(VARCHAR, BINARY_CHECKSUM(*)))) WHERE Hash is NULL' PRINT 'Updating Missing Hash Checksums in ' + @Table EXEC (@SQLcmd) End Begin Print '--The Table we are deleting from is ' +@Table+ ' ***' Set @SQLcmd = 'Delete q from (Select ROW_NUMBER() OVER ( PARTITION BY ' + @Column + ' order by ' + @Column +')R ,'+ @Column + ' From ' + @Table +' )q where R > 1' EXEC (@SQLcmd) End End-- This section creates a unique, non-clustered index with ignore_dup_key -- (This will stop duplicate records without halting execution.) IF @createIndex = 1 BEGIN SET @indexName = 'IDX_' + @Table + '_Dup-Ignore' SET @SQLcmd = 'CREATE UNIQUE NONCLUSTERED INDEX ' + @indexName + ' ON ' + @Table + ' (' + @uniqueColumn + ' ) WITH IGNORE_DUP_KEY' PRINT 'Creating Unique, Nonclustered Index' BEGIN TRY EXEC (@SQLcmd) END TRY BEGIN CATCH PRINT '*** Error Creating Unique, Nonclustered Index ***' PRINT ERROR_MESSAGE() END CATCH PRINT 'Successfully Created Unique, Nonclustered Index' END set @id = @id + 1endIF OBJECT_ID('tempdb..#tableList', 'U') IS NOT NULL DROP TABLE #tableListIF OBJECT_ID('tempdb..#Column_list', 'U') IS NOT NULL DROP TABLE #Column_listPRINT 'Duplicate Removal Script has Completed.' |
|
|
|