Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Remote Server Query Performance

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2004-08-15 : 13:19:12
I've got a query to a remote server - pulling data to be INSERTed locally. There's a 2MB line between the two servers, and there is an ongoing requirement transfer new data hourly, and periodically tens of thousands of new rows will be "added" at the remote end.

Anyway, I thought I would do a test - any comments you have would be appreciated. I restricted the transfer to 10,000 rows, based on the UPDATE DATE at the Remote end, which is representative of how we will get updated rows in the future.

-- Get most recent Changed Date on Local DB
SELECT @intCHANGE_DATE = MAX(CHANGE_DATE)
FROM dbo.MyTable

-- Test 1
SELECT TOP 10000 *
INTO #TEMP_MyTable
FROM MyRemoteServer.MyRemoteDB.dbo.MyTable
WHERE CHANGE_DATE >= COALESCE(@intCHANGE_DATE, 0)
ORDER BY CHANGE_DATE
-- 1m38s

-- Test 2
SELECT *
INTO #TEMP_MyTable
FROM
(
SELECT TOP 10000 *
FROM MyRemoteServer.MyRemoteDB.dbo.MyTable
WHERE CHANGE_DATE >= COALESCE(@intCHANGE_DATE, 0)
ORDER BY CHANGE_DATE
) X
-- 1m32s

-- Test 3 - Uses ##TEMP to be in scope
SELECT @strSQL =
'SELECT *
INTO ##TEMP_MyTable
FROM OPENQUERY(MyRemoteServer,
''
SELECT TOP 10000 *
FROM MyRemoteDB.dbo.MyTable
WHERE CHANGE_DATE >= ' + CONVERT(VARCHAR(20), @CHANGE_DATE) + '
ORDER BY CHANGE_DATE
'')'
SELECT [SQL] = @strSQL
EXEC (@strSQL)
-- 1m23s seconds

So the OPENQUERY() method looks to be about 10% faster - which is not what I was expecting.

Kristen

nr
SQLTeam MVY

12543 Posts

Posted - 2004-08-15 : 16:55:48
Does it have to generate the query plans for this or are they already there?
For the first one it would have to get the datatype of CHANGE_DATE from the remote server then convert @intCHANGE_DATE to send it across (try it with character fields and you might find something very different - like it transferringthe whole table).
With the openquery it will just send the data across to be executed on the remote server. This would always be my preferred method (wwell a combination of 2 and 3 - sending the top to the rmote server inside the openquery but with a temp table rather than global temp) as it stops people putting in cross server joins when they change the query and killing both servers.



==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

RoLYroLLs
Constraint Violating Yak Guru

255 Posts

Posted - 2004-08-15 : 21:31:04
Hey Kristen, I am planning on something similar to what you're doing, but not in the thousands of rows. One thought I had, is what if two transactions were made in the exact second? I have a few records with the same exact time so would this not cuase duplicates if, let's say:


# ... Change_Date
----- --------------------
00001 ... 8/15/2004 2:35:12 PM
...
09997 ... 8/15/2004 6:12:26 PM
09998 ... 8/15/2004 6:12:27 PM
09999 ... 8/15/2004 6:12:27 PM
10000 ... 8/15/2004 6:12:27 PM
10001 ... 8/15/2004 6:12:27 PM
10002 ... 8/15/2004 6:12:27 PM
10002 ... 8/15/2004 6:12:28 PM


So on my first request, I'd get to 10,000. But on my next request, with 'CHANGE_DATE >= COALESCE(@intCHANGE_DATE, 0)' wouldn't I get 9,998, 9,999 and 10,000 again? I'd like to avoid this.

Thanks.

- RoLY roLLs
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-08-16 : 01:29:45
RoLY,

If you can be sure that a record cannot be added AFTER this task which has the SAME change date as the most recent record, then ordering by the PK fields as well as ChangeDatem will ensure that you get an exclusive batch. But you would probably be better to move to something that knew about the "batches" at that point.

However, I DO have a situtation where I am expecting to have lots of records with IDENTICAL ChangeDate, and I may only get some of them if I pull 10,000 rows at a time and the 10,000th is part of a same-ChangeDate-cluster. I'm happy to get the tidlers at the end again next time (that means I will always get at least one in every table, but Hey!)

I am also worried that there may be more than 10,000 all with the exact-same ChangeDate (there is a customer prospect table, and new entries come from the client bulk-loading mail lists, I have no idea what their Mega Oracle Back Office System does with ChangeDate in that instance - or why the vendor of that system has a change date that is number of seconds since 1970 or somesuch - doesn't Oracle have a DateTime datatype? :)

Background (in case someone has a better idea!)

We are building a web site for a client - shopping cart type thingie - peaking at around 1,000 web order per day

They have an Oracle back-office order fulfilment system.

We are pulling relevant data from Oracle to a SQL2K "integration" box on their LAN hourly (new/changed customers, products, prices, orders status).

The Oracle system has a reliable ChangeDate indicator - so we can just "pull" everything since the last time we asked.

We don't need every column in their tables, so we then do a Delta against our own tables held on the Integration server, and only update the rows that have actual differences (using a massive opti/pessimistic WHERE clause).

Then we push the changed rows [in our own tables] up to the web server and "merge" them into the Live data.

Changed rows we are pulling from Oracle is around 250,000 per day of which 10,000 are existing record updates and 66,000 are new (and the rest had changes in columns we weren't interested in)

The Oracle system STILL doesn't properly support XML :-(

Kristen
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-08-16 : 11:44:29
Date's are only held in sql server to a granularioty of about 3 millisecs and a lot of inserts/updates can be done in that time.

There are a few methods to ensure that you don't get duplicates if you need to use a time index.
With all of these you need to make sure that the source system updates in time field order or find out the max delay.
Always transfer into a staging table first and keep the data transfer seperate from the processing - e.g. an SP gets data into the staging table. Different SP processes the data in the staging table.

Get the max time, subtract a time from it, transfer up to that time Get everything up to that time, save that time for the next transfer (greter than).
Get the max time, transfer everything up to that time, ignore entries that have already been transfered by comparing with PK, save max time for next transfer (greter than or equal).

Note that all these have a potential to transfer a lot of data if the transfer stops for a while. Better to get a block of data then find the max value in that data and use that for the next e.g.

select @dte = dte from trtbl
select top 1000 * into #a from remote where dte >= @dte order by dte - this can be any trasfer method, bcp, dts, query, ...
insert staging select #a.* from #a left join staging on #a.id = staging.id where staging.id is null - compare against staging if you know how many recs to keep
update trtbl set @dte = max(dte) from #a

keep the batch size small so you don't affect the network or remote system too much. This should use an index on the remote system so should be quick.
Be careful if the remote system isn't sql server as you may come up against a different time storage granularity.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-08-17 : 07:16:18
"Get the max time, subtract a time from it, transfer up to that time Get everything up to that time, save that time for the next transfer (greter than)."

This would need to take into account "current time" - otherwise I'll never get the latest record if it was done last Tuesday!

Kristen
Go to Top of Page
   

- Advertisement -