Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 De duping methods

Author  Topic 

masterdineen
Aged Yak Warrior

550 Posts

Posted - 2013-01-13 : 16:01:15
Hello there.

Does anyone know of any good de duping methods between two table tables.

or is there no common or best practice way of doing it. Just simply de-duping on a unique column that would be in both tables and perform sub querys ie ( where not in) or (exists )

I dont have an example yet, just wondering if there are any good ideas.

any help would be appreciated.

Thank you.

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2013-01-13 : 17:23:39
It really depends on what you are trying to accomplish. EXISTS and NOT EXISTS are good options. So are INTERSECT and EXCEPT, as well as MERGE WHEN MATCHED ON SOURCE. The better question is, why do you need to de-dup between two tables?

Jim

Everyday I learn something that somebody else already knew
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2013-01-13 : 21:59:31
JimF touched on many of the methods above. The reason why someone would want to do this is typically in the area of ETL. I consider it to be fool-hardy to try an import data directly to a final table. It think it's much safer to load the data into a staging table, validate it, identify what is new and what must be updated, and only then start adding to or modifying the target table. It usually turns out to be faster, as well because I don't generally have to do joined inserts or updates on a table that is in use. No blocking to worry about on the staging table.

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".

First step towards the paradigm shift of writing Set Based code:
"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

When writing schedules, keep the following in mind:
"If you want it real bad, that's the way you'll likely get it."
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-01-13 : 22:30:34
We dump the incoming data onto staging table and then do all validations, checks, transformation etc as Jeff suggested. The logic for data transfer from source to staging would be straight pull. For insert/updates we make use of datetime fields to compare between source and destination and do insert/updates. To compare, we can use several methods
1. MERGE
2. EXISTS/NOT EXISTS
3. LEFT JOIN / INNER JOIN
4. IN/NOT IN

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -