Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Looking for Suggestions

Author  Topic 

Taragor
Starting Member

46 Posts

Posted - 2007-10-19 : 11:22:34

Good Morning Everyone,

I'm looking for some suggestions as to the best possible approach
for some data comparisson transactions.

Table 1:
Commulative Data
Currently approximately 25million records

Table 2:
Monthly temporary processing table
Approximately 1.5million records

Requirement:
Based on a 3 field criteria, I need to do a lookup in table 1
for each record in table 2 and see if this record already exists.
If the record exists, I need to replace the data in table 1 with
the data from table 2.

Difficulties:
Table 1 has indexes that aren't setup on the criteria fields, and these indexes can't be changed since queries are done based on other criterias and this would slow down those lookups.

My current considerations are:

1) Create a third table that contains only the identifying key and 3 criteria fields from table 1 (concatinated together so basically 1 id field and 1 field with the 3 values) and do all table 2 lookups against this table. This would allow me to index the criteria field to maximize the lookup process.

*This means duplicated data and will require more storage space

2) Do the lookup directly against table 1.

*My concern is that having to search 1.5million times through 25million + records is going to be a nightmare.

3) Prior to processing, create a temporary table with the id field and concatinated criteria fields (see example 1) and do lookups against this table. Once processing is completed, this table can be deleted.

*Increased processing time to create the temporary table
**Storage increase requirement will only be during production processing.

Note:

This process will only run once a month and is done during off hours where the DB is not being accessed by anyone.

Do any of the above strategies make sense or am I completely out in left field? :)

Any and all suggestions are appreciated.


Tar

Kristen
Test

22859 Posts

Posted - 2007-10-20 : 04:41:59
"Table 1 has indexes that aren't setup on the criteria fields, and these indexes can't be changed since queries are done based on other criterias and this would slow down those lookups."

Why would adding indexes slow down existing queries?

Are you basically saying:

UPDATE T1
SET Col1 = T2.Col1, ...
FROM Table1 AS T1
JOIN Table2 AS T2
ON T2.MyPK = T1.MyPK
WHERE T1.Col1 <> T2.Col1 OR T1.Col2 <> T2.Col2 ...

Doing an update like this for in excess of 1 million records may be problematic (slow / strain on TLog file size, etc.), and it may be better to batch it into, say, 100,000 updates per iteration. (Testing will tell what Batch size is reasonable)

However, if you do it in batches it won't be atomic, nor possible to ROLLBACK.

But if its out-of-hours, as you said, then this shouldn't be a problem.

Kristen
Go to Top of Page

Taragor
Starting Member

46 Posts

Posted - 2007-10-22 : 09:37:36
quote:
Originally posted by Kristen

"Why would adding indexes slow down existing queries?
"



That's a good question.

I may have been hasty in disregarding the index. That being said, I have a very noobish question. What is the priority in indexing? (Clustered index obviously being the primary index) are subsequent indexes prioritized by creation?

IE:

Index 1 = Clustered field/s
Index 2 = First index created that isn't clustered
Index 3 = Second index created that isn't clustered
etc???

Or is there another factor that comes into play as to the order of the index?

The table itself currently has 1 clutered index and 2 non clustered indexes. Will adding a 3rd non clustered index really make a difference in lookup time?


Tar
Go to Top of Page
   

- Advertisement -