Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Import data Edge condition

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2004-08-20 : 14:04:00
Definition: UpSert is the proces sof using UPDATE or INSERT to get imported data into the system - depending on whether a row already exists, or not.

I have to transfer some data from an Oracle system, via a (local) SQL box - the "Integration Server" [in the client's office] - to their database on their web server.

The plan is that the Integration Server queries Oracle for anything that has changed (in the relevant tables) - Changed Date colums are available.

Then it compares that against some "Import Tables". The Import Tables are a permanaent copy of the data we have previously imported from Oracle. They only contain the columns we are interested in - so a record marked as Changed on Oracle does not necessarily contain changes in COLUMNS we are interested in.

If a record is different it is UpSerted into the Import Table - and a "Flag" column, in the Import Table, is set to indicate that a change has been made to that record.

Then the Web Server comes along periodically and pulls all Flagged Rows to the web server, and inserts them into the Web Database.

I am assuming that the fact that the Integration Server put a record in the Import Table means that the Web Server should insert it into the Web Database - no further checks are necessary.

However, how does the Web Server know which rows to pull? i.e. what is "new" since the last time it executed? In effect: How does the New Flag work?

I was thinking that the Import Tables could use a datetime for the Flag column - so each row UpSerted into the Integration Server's Import Tables would have an "UpdatedDate" set to the current server time, and the Web Server could pull everything since the last time it knew it had pulled data (i.e. the MAX(UpdatedDate) of records pulled so far).

However I'm worried about what happens when the clocks go back/forward ("Daylight saving time" to our descendents across the pond).

I'm unlikley to be in control of how the Integartion and Web database servers are set up, and whether they run permanaently on GMT or something else.

So I'm thinking that I should replace "UpdatedDate" with "BatchNumber" along the lines of:

Current Batch Number is stored [in a "Config" table] on the integration server. Whenever a record is UpSerted on the Integration server it gets the current BatchNumber.

When the Web Server pulls some data it FIRST increments the Batch Number, and then pulls all rows with a BatchNumber higher than the last time it ran.

Once the batch number is incremented any new activity on the Integration Server will use the new batch number, so the Web Server won't get that data until the next batch run.

However, what happens to the work the Integration server is doing [may be doing] when the Web Server comes calling?

Lets assume the Integration Server is busy pulling 10,000 rows from Oracle and UpSerting them into the integration server using, say, Batch Number 53.

The Web Server has incremented the batch number to 54 before starting and is asking for everything with a batch number of 53 - but the Integration Server is still inserting records with Batch Number 53 - how do the two system get in sync so that no rows are lost?

Hmmm ... perhaps the Integration Server should be in charge of the Batch Number ... it checks the Batch number before doing its UpSerts, and UpSerts using the current batch number +1. When it has finished it stores the new batch number that it just used; the Web Server checks the Batch number when it starts and pulls anything higher than the batch number of its previous run but "Less than or Equal to" the current batch number stored in the Config Table. Thus it does NOT pull anything that the integration server is busy UpSerting.

Nope, that doesn't work ... a row that was UpSerted with an earlier Batch Number, but which has NOT YET been pulled by the Web Server, might be re-updated within the latest Batch Number. That record will NOT get pulled by the Web Server (batch number too high) but that row might be critical to the Web Server for Referential Integrity.

Damn, I'm back at square one.

Perhaps the Integration Server should pull all new data into temporary tables and then update its local tables "all in one go" so the web server pulls either BEFORE the complete update or AFTER it. I suppose I can just do a BEGIN TRANSACTION on the integration server before the massive UpSert ... but its likely to be of the order of 250,000 rows - will that put a big strain on Logs and Table locks?

Damn this is scary ... thank goodness our descendents across the pond haven't got to margaruita time yet!

Kristen

Kristen
Test

22859 Posts

Posted - 2004-08-20 : 18:00:02
No takers?

Kristen
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-08-21 : 14:01:10
this maybe stupid, but i just awoke. :))
since we in europe know when the time changes in daylight savings time, couldn't you somehow add or subtract one hour from used conditions??

Go with the flow & have fun! Else fight the flow :)
Go to Top of Page

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-08-21 : 16:18:00
Kristen, why don't you just have the batch number batch master table. In your intermediate tables store batch number, insert/update flag/pulled flag. In a batch detail table, have a batch number/import begin flag/import complete flat/export begin flag/export complete flag. This lets the integration server "drive the whole process". You then don't have to worry about times. You could be 20 hours different. The "subscriber" is just looking for items that have an import complete and no export begin flag checked. This could also be useful if you have multiple subscribers in the future. You could add a subscriber_id to your batch detail table. You could have a batch/subscriber table. All kinds of kewl things you could do with this.

I use this method a lot for import processes, synchronization process, and maintenance processes such as EMC SQL clones.

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-08-22 : 05:12:57
I think I agree with Derrick. When we do this update where flag = 'P' (pending) to 'I' (in progress), move the 'I' flagged records, then set the flag to 'C' (complete). Any new rows with 'P' flags will be processed the next time around no matter how log it took to process the 'I' records.


--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-08-23 : 04:01:28
Thanks guys. Once I woke up to the potential daylight-saving-time-snafu I thought Batch Number was the answer, but just wanted to run it buy you.

I'm a bit of a detail person, so Derrick's heavy-weght solutions is looking good!

Looks like driving it all fromt he integration server is important so that I can control data being passed in "complete batches" so as to avoid any R.I. issues on the receiving end.

Kristen
Go to Top of Page
   

- Advertisement -