Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 [SOLVED] help eliminate dupes

Author  Topic 

SparkByte
Yak Posting Veteran

60 Posts

Posted - 2006-11-30 : 14:14:07
I am VERY new to SQL and I am having a heck of a time biulding a script to find and remove duplicate entries.

Here is the table structure.




CREATE TABLE [dbo].[SecurityEvents](
[EventLog] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[RecordNumber] [int] NULL,
[TimeGenerated] [datetime] NULL,
[TimeWritten] [datetime] NULL,
[EventID] [int] NULL,
[EventType] [int] NULL,
[EventTypeName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[EventCategory] [int] NULL,
[EventCategoryName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[SourceName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Strings] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ComputerName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[SID] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Message] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Data] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

GO



This small script seems to eliminate the dupes, but I can't seem to figure out to properly replce the table the with output of the script with all the dupes gone.




select distinct * from dbo.SecurityEventsTest where recordnumber IN
(select recordnumber from dbo.SecurityEvents)
order by recordnumber



Could someone help??

Thank You,

John Fuhrman
http://www.titangs.com

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-30 : 14:53:19
You need to add a primary key or unique index to the table to prevent further dupes. Even if you still allow the duplicates to be inserted into the non-key columns, it will be much easier to remove them if you have a primary key or unique column as well as all these duplicated columns.

Here's how you can remove the dupes you have now, but it will perform pretty badly because it has to hold all the dupes in a temp table when you delete them.

SELECT [EventLog],
[RecordNumber],
[TimeGenerated],
[TimeWritten],
[EventID],
[EventType],
[EventTypeName],
[EventCategory],
[EventCategoryName],
[SourceName],
[Strings],
[ComputerName],
[SID],
[Message],
[Data]
INTO #Dups
FROM [dbo].[SecurityEvents]
GROUP BY [EventLog],
[RecordNumber],
[TimeGenerated],
[TimeWritten],
[EventID],
[EventType],
[EventTypeName],
[EventCategory],
[EventCategoryName],
[SourceName],
[Strings],
[ComputerName],
[SID],
[Message],
[Data]
HAVING count(*) > 1

DELETE [dbo].[SecurityEvents]
FROM [dbo].[SecurityEvents] S
INNER JOIN #Dups D ON D.[EventLog] = S.[EventLog] AND
D.[RecordNumber] = S.[RecordNumber] AND
D.[TimeGenerated] = S.[TimeGenerated] AND
D.[TimeWritten] = S.[TimeWritten] AND
D.[EventID] = S.[EventID] AND
D.[EventType] = S.[EventType] AND
D.[EventTypeName] = S.[EventTypeName] AND
D.[EventCategory] = S.[EventCategory] AND
D.[EventCategoryName] = S.[EventCategoryName] AND
D.[SourceName] = S.[SourceName] AND
D.[Strings] = S.[Strings] AND
D.[ComputerName] = S.[ComputerName] AND
D.[SID] = S.[SID] AND
D.[Message] = S.[Message] AND
D.[Data] = S.[Data]

INSERT [dbo].[SecurityEvents]
SELECT * FROM #Dups

DROP TABLE #Dups
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2006-11-30 : 15:07:26
whoah...easy now, what if they just want 1 occurance? And I would copy anything deleted to a permanent table, or at least bcp it out to a file

I wonder if 2k5 allows recovery of a single table yet



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-30 : 15:08:59
"whoah...easy now, what if they just want 1 occurance?"
This will leave exactly one occurence of each each unique row in the table.
Go to Top of Page

SparkByte
Yak Posting Veteran

60 Posts

Posted - 2006-11-30 : 15:39:53
What would be the difference with something like this??
drop table SecurityEventsTemp

select distinct dbo.SecurityEvents.*
into dbo.SecurityEventsTemp
from SecurityEvents

select * from SecurityEvents order by RecordNumber


Then droping the original table and renaming the temp?

Also, this table is getting quite large as it hold the security logs from about 65 servers. So performance could be an issue in a few months.

Thanks!

Thank You,

John Fuhrman
http://www.titangs.com
Go to Top of Page

SparkByte
Yak Posting Veteran

60 Posts

Posted - 2006-11-30 : 15:41:28
Oh, this is SQL 2000,

Sorry.

Thank You,

John Fuhrman
http://www.titangs.com
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-30 : 16:00:04
quote:
Originally posted by SparkByte

What would be the difference with something like this??
drop table SecurityEventsTemp

select distinct dbo.SecurityEvents.*
into dbo.SecurityEventsTemp
from SecurityEvents

select * from SecurityEvents order by RecordNumber


Then droping the original table and renaming the temp?

Also, this table is getting quite large as it hold the security logs from about 65 servers. So performance could be an issue in a few months.


Doing it the way you show it, will copy every row (even those that are not duplicated) every time. My code will delete the dupes then add back a single copy of each one. All the rows that are not duplicated will never be deleted or reinserted.
If performance is a problem then you should be making sure now that you stop inserting the duplicates in the first place.
Go to Top of Page

SparkByte
Yak Posting Veteran

60 Posts

Posted - 2006-11-30 : 16:30:13
Even with the 'select distinct'? This query did seem to write the new test table without the duplicates.

I can't even begin to understand the dirrerences between the two methods. Especially since the last programming course I took basic still used line numbering.

Thanks!!!

Thank You,

John Fuhrman
http://www.titangs.com
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-30 : 17:44:38
If you do this
drop table SecurityEventsTemp

select distinct dbo.SecurityEvents.*
into dbo.SecurityEventsTemp
from SecurityEvents

select * from SecurityEvents order by RecordNumber


a row gets copied to the temp table for every distinct row in the main table. If there is a row in the main table that is not a duplicate it is copied too because it is a distinct row. Now what are you planning to do with that temp table? In your example you show a SELECT, which I think you meant to be on the temp table? But what you started out wanting was to delete the dupes from the original table. You'd now do that by deleting all the rows from the main table and inserting them back from the temp table?
My code does essentially the same, but it only puts rows which are actually dupes into the temp table. Rows that are not dupes are left in the main table.

I should point out that if a high percentage (how high, I don't know, you'll need to test) of the rows are dupes then just moving all the distinct rows including the non-dupes to the temp table and truncating the main table would be faster. But if a fairly low percentage are dupes then my code should be much faster because it will only delete a small number of rows instead of basically copying the entire table to remove those few dupes.

The full code for your way will be like this, test it both ways.

select distinct *
into #Dups
from dbo.SecurityEvents
truncate table dbo.SecurityEvents
insert dbo.SecurityEvents
select * from #Dups
drop table #Dups
Go to Top of Page

SparkByte
Yak Posting Veteran

60 Posts

Posted - 2006-11-30 : 19:33:11
Thanks snSQL, that helps a lot. The first time through I will probably need to do it 'my way' then for continued maintenance your version would be better.

Thanks again, your help is VERY much appreciated.

John F 3

Thank You,

John Fuhrman
http://www.titangs.com
Go to Top of Page
   

- Advertisement -