Checking for dupilcates - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Checking for dupilcates

Author

Topic

vwilsonjr
Starting Member

45 Posts

Posted - 2003-05-14 : 13:01:31

I'm trying to run a scripts that will show me duplicate records I have. The table has 3 columns (UserID - RoleID - SiteID). I have users with the same roleID. I know how to do it comparing two tables but I cant figure out how to do it with one. Thanks in advance for your help

Thank God for Forums.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2003-05-14 : 13:03:56

You should always search the articles and the forums because a lot of topics have already been covered:

[url]http://www.sqlteam.com/item.asp?ItemID=3331[/url]

Tara

pmr
Starting Member

37 Posts

Posted - 2003-05-14 : 14:38:56

That article is really good. It mentions something at the end:

quote:

It's actually possible to combine the SELECT INTO #dupes and the DELETE into one DELETE statement.

here's an example:

DELETE CardRead.dbo.tblChurchRaw FROM CardRead.dbo.tblChurchRaw
LEFT OUTER JOIN (  --{BEGIN DupsKept}
SELECT ID, MAX(SwipeDate) AS SwipeDate FROM CardRead.dbo.tblChurchRaw
 GROUP BY ID HAVING count(*) > 1
) DupsKept
ON CardRead.dbo.tblChurchRaw.ID = DupsKept.ID  --More columns can be added here
 AND CardRead.dbo.tblChurchRaw.SwipeDate = DupsKept.SwipeDate
INNER JOIN (  --{BEGIN DupsAll}
SELECT ID FROM CardRead.dbo.tblChurchRaw GROUP BY ID HAVING count(*) > 1
) DupsAll
ON CardRead.dbo.tblChurchRaw.ID = DupsAll.ID  --More columns can be added here
WHERE DupsKept.Id IS NULL

The [SwipeDate] field is the unique field and the [ID] field is the duplicate.
There was actually another field where I wanted the last one. That is why I used Max(SwipeDate). I also noted where you can add more duplicate columns to the join statements as necessary. This also does not require a temp table to be populated first.

The article also mentions:

quote:
If you have a large table the join statement can get kind of large. I guess that's the price you pay for letting duplicates into your database.

It might be worth mentioning here that it is always a good idea to put restrictions in the database structure that keep bad data from being able to exist. In this situation, a unique index including all of the fields that you never want duplicates in would accomplish this.

Edited by - pmr on 05/14/2003 14:40:37

vwilsonjr
Starting Member

45 Posts

Posted - 2003-05-14 : 21:15:19

Point taken tduggan. I did a search on the forum but didnt think about searching the sqlteam pages. I will do that in the future.

pmr, Yea it would be a good idea to make it that dups cant be added, however this database is driving an ASP site I didnt develop and cant realy change the database restrictions and get support on the app when needed, plus as the saying goes data is only as good as the idoits entering it. But thanks for you help.

Thank God for Forums.

pmr
Starting Member

37 Posts

Posted - 2003-05-14 : 22:12:23

Yep, no better, but someone is going to blame you anyway. You can make your life a bit simpler:

I've spent most of my SQL experience in same situation and, as a result, have chosen not to make many changes to my database structure that I otherwise would. This is one change that I would make because it wouldn't do anything except prevent duplicates from being added to the database. The worst thing that would happen is that a process would fail because it couldn't add the duplicate record it wanted. Usually this is a good thing. The second worst thing that could happen is that the user trying to insert or update would see an error message when saving something like: This process could not be completed because it would create a duplicate in one or more indexes. This is definitely a good thing for the "idiot" to see the error they are causing instead of getting away with it.

Now that you know that it is harmless and helpful at the same time, here is how you deal with the vendor: When you run into a problem on the database that you know is their problem, delete your index and verify that THEIR database is still causing a problem the way THEY built it. They you can show them what THEIR database is causing the way THEY built it and they can fix THEIR database. After they are done, you can put the index back that you need for live processing.

The adding and deleting of the index is really easy also. All you need to do is create the index and then script the table it is on. You can then delete all of the code except the CREATE INDEX code for your index. Add the DROP INDEX code that you need and keep it all in a .sql file for future processing. Any other modifications that need to be able to be dropped in order to test the database in its unmodified state can be kept in a file that can be run all at once.

Hmmmmm, After writing all of this I double checked your field names. You may be better off just deleting the dups and eliminating the error message related trouble calls from your - well - users while you and they live in ignorant bliss.

The above can still be done given the right situation (like other innocent fields in the same table that might have held valuable data). Another reader might find this valuable.

Peter

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources