SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 De duping methods
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

masterdineen
Aged Yak Warrior

United Kingdom
548 Posts

Posted - 01/13/2013 :  16:01:15  Show Profile  Reply with Quote
Hello there.

Does anyone know of any good de duping methods between two table tables.

or is there no common or best practice way of doing it. Just simply de-duping on a unique column that would be in both tables and perform sub querys ie ( where not in) or (exists )

I dont have an example yet, just wondering if there are any good ideas.

any help would be appreciated.

Thank you.

jimf
Flowing Fount of Yak Knowledge

USA
2869 Posts

Posted - 01/13/2013 :  17:23:39  Show Profile  Reply with Quote
It really depends on what you are trying to accomplish. EXISTS and NOT EXISTS are good options. So are INTERSECT and EXCEPT, as well as MERGE WHEN MATCHED ON SOURCE. The better question is, why do you need to de-dup between two tables?

Jim

Everyday I learn something that somebody else already knew
Go to Top of Page

Jeff Moden
Aged Yak Warrior

USA
649 Posts

Posted - 01/13/2013 :  21:59:31  Show Profile  Reply with Quote
JimF touched on many of the methods above. The reason why someone would want to do this is typically in the area of ETL. I consider it to be fool-hardy to try an import data directly to a final table. It think it's much safer to load the data into a staging table, validate it, identify what is new and what must be updated, and only then start adding to or modifying the target table. It usually turns out to be faster, as well because I don't generally have to do joined inserts or updates on a table that is in use. No blocking to worry about on the staging table.

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".

First step towards the paradigm shift of writing Set Based code:
"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

When writing schedules, keep the following in mind:
"If you want it real bad, that's the way you'll likely get it."
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 01/13/2013 :  22:30:34  Show Profile  Reply with Quote
We dump the incoming data onto staging table and then do all validations, checks, transformation etc as Jeff suggested. The logic for data transfer from source to staging would be straight pull. For insert/updates we make use of datetime fields to compare between source and destination and do insert/updates. To compare, we can use several methods
1. MERGE
2. EXISTS/NOT EXISTS
3. LEFT JOIN / INNER JOIN
4. IN/NOT IN

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.11 seconds. Powered By: Snitz Forums 2000