Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 join for large tables

Author  Topic 

sapient007
Starting Member

5 Posts

Posted - 2007-07-31 : 14:39:11
joining 2 large tables (15million rows table A) with (2million rows Table B). getting new server is currently not an option. on top of that we are joining on ~30 columns with some of them with lots of NULLS.

this is actually a script to clean up the DB from duplicates and the smaller table is actually inserted with all duplicates from the larger table to be re-inserted back when it's cleaned up. The issue i'm having is that during the delete join statement it's taking almost 7hrs and it's a pain to wait for it to get it done.

anyway to shorten it?? i'm currently joining A to B. i'm under the impression that during inner join, the order/position of the table doesn't matter. am i wrong?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-31 : 14:41:14
A query would help.
A description of which indexes you have, would help too.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

sapient007
Starting Member

5 Posts

Posted - 2007-07-31 : 14:49:00
aight, i did some quick global replace. B is the temp table to remove dupes. table A actually has more columns I'm only interested in those listed.

truncate table B

insert into B (AID, BG, DD, DT, ea, dcd, emct, GID, hle, hgs, hsid, [ID], AID, BG, DD, DT, ea, dcd, emct, GID, hle, hgs, hsid, [ID]) (select AID, BG, DD, DT, ea, dcd, emct, GID, hle, hgs, hsid, [ID], AID, BG, DD, DT, ea, dcd, emct, GID, hle, hgs, hsid, [ID] having count(*) >1 )

delete from A from B inner join A on
A.AID= B.AID AND
A.BG= B.BG AND
A.DD= B.DD AND
A.DT= B.DT AND
A.ea= B.ea AND
A.dcd= B.dcd AND
A.emct= B.emct AND
A.GID= B.GID AND
A.hle= B.hle AND
A.hgs= B.hgs AND
A.hsid= B.hsid AND
A.[ID]= B.[ID] AND
A.AID= B.AID AND
A.BG= B.BG AND
A.DD= B.DD AND
A.DT= B.DT AND
A.ea= B.ea AND
A.dcd= B.dcd AND
A.emct= B.emct AND
A.GID= B.GID AND
A.hle= B.hle AND
A.hgs= B.hgs AND
A.hsid= B.hsid


insert into A select * from B


Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-07-31 : 16:26:49
delete that large will take a long time due to transaction log resizing.

try deleting this in batches of 10000.


_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
Go to Top of Page

sapient007
Starting Member

5 Posts

Posted - 2007-07-31 : 16:38:12
how do i delete in batches of 1000??


do i cap dupe table size to 1000 rows with the insert into xx (select top 10000 ) caluse??

or is there a way to only join the first 10000 rows?
Go to Top of Page

sapient007
Starting Member

5 Posts

Posted - 2007-07-31 : 16:40:17
on a side note, is there a way to avoid transaction log all together on the delete/join statement since table restore will not ever be needed??
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-07-31 : 16:44:21
1.
here's a noce thread covering that:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=77084

2.
no. you can't avoid tran log.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
Go to Top of Page
   

- Advertisement -