Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Dedupe a table

Author  Topic 

Yellowdog
Starting Member

34 Posts

Posted - 2009-11-09 : 18:09:58
So I am trying to dedupe a table and I need some help. Now I know that a table should have keys ect to keep dupes out of the table but this is a hold table where raw data is going in and getting cleaned.

Here is what I have gotten so far.


select * into #temp
from details d

where exists

(
select * from details as dd
where

dd.id = d.id and
dd.number = d.number and
dd.line_item = d.line_item and
dd.op_code = d.op_code and
dd.op_description = d.op_description and
dd.labor_cost = d.labor_cost and
dd.labor_sale = d.labor_sale and
dd.misc_cost = d.misc_cost and
dd.misc_sale = d.misc_sale and
dd.part_cost = d.part_cost and
dd.part_sale = d.part_sale and
dd.pay_type = d.pay_type and
dd.tech_name = d.tech_name and
dd.advisor_number = d.advisor_number and
dd.store_id = d.store_id and
dd.detailsid > d.detailsid
)

delete from details where detailsid in (select distinct detailsid from #temp)

drop table #temp_details


The problem I am having is getting the numbers to jive with what I have below and I dont know why.

select id, number, line_item, op_code, op_description, labor_cost, labor_sale, misc_cost, misc_sale, part_cost, part_sale, pay_type, tech_name, advisor_number, store_id, count(*) from
downloaded_details_data
group by id, number, line_item, op_code, op_description, labor_cost, labor_sale, misc_cost, misc_sale, part_cost, part_sale, pay_type, tech_name, advisor_number, store_id
HAVING count(*) > 1


I think they are the same thing but are giving different results.

Any help would be great in helping me understand what I am doing wrong.

Cheers

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-11-10 : 12:51:56
they are not the same. presence of nulls in any of fields can cause change in result.
Go to Top of Page

Yellowdog
Starting Member

34 Posts

Posted - 2009-11-10 : 13:37:22
Did not take that into consideration.

For some reason I believe that the second method is best for finding dupes. It just makes more logical sense to me. I am new to deduping so I am not really sure.

I am going to try and build a procedure around the second statement.
Will post when I am done.

If someone else has a good dedupe procedure they would like to share I would appreciate the help.

Cheers
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2009-11-10 : 14:40:35
What's about DETAILSID?
Can you show some sample records?

What version you're using?

Maybe there is a way to use ROW_NUMBER() and your detailsid...


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page
   

- Advertisement -