Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Taragor
Starting Member
46 Posts |
Posted - 2007-10-19 : 11:22:34
|
Good Morning Everyone,I'm looking for some suggestions as to the best possible approachfor some data comparisson transactions.Table 1: Commulative Data Currently approximately 25million recordsTable 2:Monthly temporary processing tableApproximately 1.5million recordsRequirement:Based on a 3 field criteria, I need to do a lookup in table 1for each record in table 2 and see if this record already exists.If the record exists, I need to replace the data in table 1 withthe data from table 2.Difficulties:Table 1 has indexes that aren't setup on the criteria fields, and these indexes can't be changed since queries are done based on other criterias and this would slow down those lookups.My current considerations are:1) Create a third table that contains only the identifying key and 3 criteria fields from table 1 (concatinated together so basically 1 id field and 1 field with the 3 values) and do all table 2 lookups against this table. This would allow me to index the criteria field to maximize the lookup process.*This means duplicated data and will require more storage space2) Do the lookup directly against table 1.*My concern is that having to search 1.5million times through 25million + records is going to be a nightmare.3) Prior to processing, create a temporary table with the id field and concatinated criteria fields (see example 1) and do lookups against this table. Once processing is completed, this table can be deleted.*Increased processing time to create the temporary table**Storage increase requirement will only be during production processing.Note:This process will only run once a month and is done during off hours where the DB is not being accessed by anyone.Do any of the above strategies make sense or am I completely out in left field? :)Any and all suggestions are appreciated.Tar |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-20 : 04:41:59
|
"Table 1 has indexes that aren't setup on the criteria fields, and these indexes can't be changed since queries are done based on other criterias and this would slow down those lookups."Why would adding indexes slow down existing queries?Are you basically saying:UPDATE T1SET Col1 = T2.Col1, ...FROM Table1 AS T1 JOIN Table2 AS T2 ON T2.MyPK = T1.MyPKWHERE T1.Col1 <> T2.Col1 OR T1.Col2 <> T2.Col2 ... Doing an update like this for in excess of 1 million records may be problematic (slow / strain on TLog file size, etc.), and it may be better to batch it into, say, 100,000 updates per iteration. (Testing will tell what Batch size is reasonable)However, if you do it in batches it won't be atomic, nor possible to ROLLBACK.But if its out-of-hours, as you said, then this shouldn't be a problem.Kristen |
 |
|
Taragor
Starting Member
46 Posts |
Posted - 2007-10-22 : 09:37:36
|
quote: Originally posted by Kristen "Why would adding indexes slow down existing queries?"
That's a good question. I may have been hasty in disregarding the index. That being said, I have a very noobish question. What is the priority in indexing? (Clustered index obviously being the primary index) are subsequent indexes prioritized by creation?IE:Index 1 = Clustered field/sIndex 2 = First index created that isn't clusteredIndex 3 = Second index created that isn't clusteredetc???Or is there another factor that comes into play as to the order of the index?The table itself currently has 1 clutered index and 2 non clustered indexes. Will adding a 3rd non clustered index really make a difference in lookup time?Tar |
 |
|
|
|
|
|
|