Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Remote process failing - default Timeout??

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2010-10-20 : 05:40:38
I have an Sproc "MasterTransfer" which does:

EXEC @intErrNo = RemoteServer.RemoteDatabase.dbo.PullDataFromOracle

EXEC @intErrNo = dbo.PullDataFromRemoteServer_ToStagingTables

EXEC @intErrNo = dbo.ProcessDataFromStagingTables

This Sproc is scheduled to run half-hourly.

Between 04:00 and 06:59 the remote PullDataFromOracle Sproc pulls data from an additional Oracle table. It fails every time (I presume at this extra step, which the logging suggests is the case - preceding step logs a result, the big oracle query is not in the log table)

I ran it manually and it ran fine, however the extra Oracle pull took 12 minutes to run (and pulled 140,000 rows).

I can't see that there is anything wrong with the Sproc.

Note that the Sproc runs on the half-hour, so if there was some blocking on Oracle (say) that should only effect some of the runs between 04:00 and 06:59 (unless there is something going on the whole of that period, my gut feeling is that is unlikely)

Is it possible that there is a default timeout on the EXEC of the RemoteSproc ?

Nothing in the logs of any note (neither Remote nor Local SQL Server Error log, nor SQL Server Agent log)

Would a Network Error show up in the logs?

Ordinarily, when it works without the additional big Oracle query, the RemoteSproc takes 9 minutes - seems long enough to have failed if there was a timeout? So in total, when it fails, it would need to run for about 22 minutes.

When RemoteSproc fails the remainder of the local Sprocs still run - even though every Sproc calls has a check for Return Value AND for @@ERROR

Description of process:

MasterTransfer calls an Sproc (PullDataFromOracle) on the Remote Server which pulls data from a variety of tables in a yet-more-remote Oracle database into staging tables on RemoteServer.

PullDataFromOracle calls a separate Child Sproc for each table to be transferred from Oracle, and one of those has an extra Oracle query if run between 04:00 and 06:59. Each child Sproc logs success to a logging table (hence I think it is possible to deduce that the next-sproc-after-last-one-logged has failed).

Then MasterTransfer calls a (local) Sproc (PullDataFromRemoteServer_ToStagingTables) that pulls the data from Remote to Local staging tables - as above, this Sproc calls a separate Child Sproc for each table to be transferred and each Sproc logs its result.

Finally MasterTransfer calls an Sproc that processes the (now) local data - as above, this Sproc calls a separate Child Sproc for each table to be transferred and each Sproc logs its result.

In case relevant this is what I see in the Logs:

Remote:

2010-10-20 04:01:08.693 Started
2010-10-20 04:06:16.763 Final Sproc logged. Next one would be Big Oracle Query

2010-10-20 04:31:08.730 Next batch run started

Local:

2010-10-20 04:02:00.743 Started (clocks not in sync? ). Calls Remote
2010-10-20 04:15:56.730 PullDataFromRemoteServer_ToStagingTables started

Edit: Checked the clocks, Remote Server is 50 seconds behind Local Server

Which suggests that between approx 04:06:16.763 on remote and 04:15:56.730 on local the Big Oracle Query was running, and then terminated for some reason.

I could schedule the Xfers on Remote, but it is crucial that they run sequentially with the ones locally, hence I have been calling Remote from Local so they cannot accidentally overlap.

Kristen
Test

22859 Posts

Posted - 2010-10-21 : 07:20:35
Current thinking is that the Oracle Driver is fragile and it may be timing out.

Any ideas why this would not be being trapped anywhere?

PullDataFromOracle Sproc does a:

SELECT *
INTO #TempTable
FROM OPENQUERY(LinkedOracleServer, "SELECT * FROM OracleTable")

lets assume that gets a timeout or some sort of network error from the Oracle driver.

PullDataFromOracle is is on RemoteServer and is called from the local server:

EXEC @intError = RemoteServer.RemoteDatabase.dbo.PullDataFromOracle
@intRetVal = @intRetVal_LOCAL OUTPUT
IF @intError <> 0 OR @intRetVal_LOCAL <> 0 OR @@ERROR <> 0
BEGIN
... Error Handler ...
GOTO MyExitLabel
END
INSERT INTO LogTable(MyDateTime, MyMessage) SELECT GetDate(), 'PullDataFromOracle completed OK'

any ideas why this is not catching any sort of error? and just carrying on?

Can I simulate this error so that I can test error handling to find a way to catch the error?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-10-21 : 07:34:06
P.S. Probably ALL these Sproc (and if not ALL then MOST) will have

SET XACT_ABORT ON

which might be causing/allowing a "Carry On" situation?
Go to Top of Page
   

- Advertisement -