Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Insert statement to avoid duplicates

Author  Topic 

matty1stop
Starting Member

37 Posts

Posted - 2007-05-15 : 11:43:08
I am exporting data from a warehouse to a datamart. So on a daily basis I want to insert only new records into the datamart.

I know there are many factors to consider but in general, would something like the following example be the most efficient approach? Each table has millions of records and I'm trying to improve performance.

P.S. I've used the DB Tuning advisor and applied the recommendations, but it is still taking a very long time (over an hour) to run.

Thanks for the help,

Matt


insert into table1 t1a
select * from table2 t2
where not exists
(select * from table1 t1b
where t1b.field1 = t2.field1
and t1b.field2 = t2.field2)

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-05-15 : 11:48:57
I would prefer LEFT JOIN over not exists.

Insert into table1
select * from table2 t2 LEFT join table1 t1
on t1.field1 = t2.field1 and t1.field2 = t2.field2
Where t1.field1 is NULL and t1.field2 is NULL



Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"
Go to Top of Page

matty1stop
Starting Member

37 Posts

Posted - 2007-05-15 : 12:04:13
When I limit the data using a filter on one field the performance of the EXISTS query is quicker than when I try using the LEFT JOIN. (0:48 secs to return 55 records for LEFT JOIN vs 0:38 secs for NOT EXISTS) Would this performace degrade when returning the greater number of records to make the LEFT JOIN quicker, or is there something I am missing?
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-05-15 : 12:24:05
I'm with Harsh on my preference for LEFT OUTER JOINs for consistant syntax and because they tend to perform better on larger tables. However,this is one of those things with so many variables it can be hard to predict. The only way to say which is better is to try each method and see what happens.

This is only a guess, but I would expect the NOT EXISTS to perform better for a small insert like your test.

-Ryan
Go to Top of Page
   

- Advertisement -