Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SparkByte
Yak Posting Veteran
60 Posts |
Posted - 2006-11-30 : 14:14:07
|
I am VERY new to SQL and I am having a heck of a time biulding a script to find and remove duplicate entries.Here is the table structure.
CREATE TABLE [dbo].[SecurityEvents]( [EventLog] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [RecordNumber] [int] NULL, [TimeGenerated] [datetime] NULL, [TimeWritten] [datetime] NULL, [EventID] [int] NULL, [EventType] [int] NULL, [EventTypeName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [EventCategory] [int] NULL, [EventCategoryName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SourceName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Strings] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ComputerName] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SID] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Message] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Data] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]GO
This small script seems to eliminate the dupes, but I can't seem to figure out to properly replce the table the with output of the script with all the dupes gone.
select distinct * from dbo.SecurityEventsTest where recordnumber IN (select recordnumber from dbo.SecurityEvents) order by recordnumber
Could someone help??Thank You,John Fuhrmanhttp://www.titangs.com |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-30 : 14:53:19
|
You need to add a primary key or unique index to the table to prevent further dupes. Even if you still allow the duplicates to be inserted into the non-key columns, it will be much easier to remove them if you have a primary key or unique column as well as all these duplicated columns.Here's how you can remove the dupes you have now, but it will perform pretty badly because it has to hold all the dupes in a temp table when you delete them.SELECT [EventLog], [RecordNumber], [TimeGenerated], [TimeWritten], [EventID], [EventType], [EventTypeName], [EventCategory], [EventCategoryName], [SourceName], [Strings], [ComputerName], [SID], [Message], [Data]INTO #DupsFROM [dbo].[SecurityEvents]GROUP BY [EventLog], [RecordNumber], [TimeGenerated], [TimeWritten], [EventID], [EventType], [EventTypeName], [EventCategory], [EventCategoryName], [SourceName], [Strings], [ComputerName], [SID], [Message], [Data]HAVING count(*) > 1DELETE [dbo].[SecurityEvents]FROM [dbo].[SecurityEvents] SINNER JOIN #Dups D ON D.[EventLog] = S.[EventLog] AND D.[RecordNumber] = S.[RecordNumber] AND D.[TimeGenerated] = S.[TimeGenerated] AND D.[TimeWritten] = S.[TimeWritten] AND D.[EventID] = S.[EventID] AND D.[EventType] = S.[EventType] AND D.[EventTypeName] = S.[EventTypeName] AND D.[EventCategory] = S.[EventCategory] AND D.[EventCategoryName] = S.[EventCategoryName] AND D.[SourceName] = S.[SourceName] AND D.[Strings] = S.[Strings] AND D.[ComputerName] = S.[ComputerName] AND D.[SID] = S.[SID] AND D.[Message] = S.[Message] AND D.[Data] = S.[Data]INSERT [dbo].[SecurityEvents]SELECT * FROM #DupsDROP TABLE #Dups |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-30 : 15:08:59
|
| "whoah...easy now, what if they just want 1 occurance?"This will leave exactly one occurence of each each unique row in the table. |
 |
|
|
SparkByte
Yak Posting Veteran
60 Posts |
Posted - 2006-11-30 : 15:39:53
|
What would be the difference with something like this??drop table SecurityEventsTempselect distinct dbo.SecurityEvents.*into dbo.SecurityEventsTemp from SecurityEventsselect * from SecurityEvents order by RecordNumber Then droping the original table and renaming the temp?Also, this table is getting quite large as it hold the security logs from about 65 servers. So performance could be an issue in a few months.Thanks!Thank You,John Fuhrmanhttp://www.titangs.com |
 |
|
|
SparkByte
Yak Posting Veteran
60 Posts |
Posted - 2006-11-30 : 15:41:28
|
| Oh, this is SQL 2000,Sorry.Thank You,John Fuhrmanhttp://www.titangs.com |
 |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-30 : 16:00:04
|
quote: Originally posted by SparkByte What would be the difference with something like this??drop table SecurityEventsTempselect distinct dbo.SecurityEvents.*into dbo.SecurityEventsTemp from SecurityEventsselect * from SecurityEvents order by RecordNumber Then droping the original table and renaming the temp?Also, this table is getting quite large as it hold the security logs from about 65 servers. So performance could be an issue in a few months.
Doing it the way you show it, will copy every row (even those that are not duplicated) every time. My code will delete the dupes then add back a single copy of each one. All the rows that are not duplicated will never be deleted or reinserted.If performance is a problem then you should be making sure now that you stop inserting the duplicates in the first place. |
 |
|
|
SparkByte
Yak Posting Veteran
60 Posts |
Posted - 2006-11-30 : 16:30:13
|
Even with the 'select distinct'? This query did seem to write the new test table without the duplicates.I can't even begin to understand the dirrerences between the two methods. Especially since the last programming course I took basic still used line numbering.  Thanks!!!Thank You,John Fuhrmanhttp://www.titangs.com |
 |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-30 : 17:44:38
|
If you do thisdrop table SecurityEventsTempselect distinct dbo.SecurityEvents.*into dbo.SecurityEventsTemp from SecurityEventsselect * from SecurityEvents order by RecordNumber a row gets copied to the temp table for every distinct row in the main table. If there is a row in the main table that is not a duplicate it is copied too because it is a distinct row. Now what are you planning to do with that temp table? In your example you show a SELECT, which I think you meant to be on the temp table? But what you started out wanting was to delete the dupes from the original table. You'd now do that by deleting all the rows from the main table and inserting them back from the temp table?My code does essentially the same, but it only puts rows which are actually dupes into the temp table. Rows that are not dupes are left in the main table. I should point out that if a high percentage (how high, I don't know, you'll need to test) of the rows are dupes then just moving all the distinct rows including the non-dupes to the temp table and truncating the main table would be faster. But if a fairly low percentage are dupes then my code should be much faster because it will only delete a small number of rows instead of basically copying the entire table to remove those few dupes.The full code for your way will be like this, test it both ways.select distinct *into #Dupsfrom dbo.SecurityEventstruncate table dbo.SecurityEventsinsert dbo.SecurityEventsselect * from #Dupsdrop table #Dups |
 |
|
|
SparkByte
Yak Posting Veteran
60 Posts |
Posted - 2006-11-30 : 19:33:11
|
| Thanks snSQL, that helps a lot. The first time through I will probably need to do it 'my way' then for continued maintenance your version would be better.Thanks again, your help is VERY much appreciated.John F 3Thank You,John Fuhrmanhttp://www.titangs.com |
 |
|
|
|
|
|
|
|