Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Kristen
Test
22859 Posts |
Posted - 2010-10-20 : 05:40:38
|
I have an Sproc "MasterTransfer" which does:EXEC @intErrNo = RemoteServer.RemoteDatabase.dbo.PullDataFromOracleEXEC @intErrNo = dbo.PullDataFromRemoteServer_ToStagingTablesEXEC @intErrNo = dbo.ProcessDataFromStagingTablesThis Sproc is scheduled to run half-hourly.Between 04:00 and 06:59 the remote PullDataFromOracle Sproc pulls data from an additional Oracle table. It fails every time (I presume at this extra step, which the logging suggests is the case - preceding step logs a result, the big oracle query is not in the log table)I ran it manually and it ran fine, however the extra Oracle pull took 12 minutes to run (and pulled 140,000 rows).I can't see that there is anything wrong with the Sproc.Note that the Sproc runs on the half-hour, so if there was some blocking on Oracle (say) that should only effect some of the runs between 04:00 and 06:59 (unless there is something going on the whole of that period, my gut feeling is that is unlikely)Is it possible that there is a default timeout on the EXEC of the RemoteSproc ?Nothing in the logs of any note (neither Remote nor Local SQL Server Error log, nor SQL Server Agent log)Would a Network Error show up in the logs?Ordinarily, when it works without the additional big Oracle query, the RemoteSproc takes 9 minutes - seems long enough to have failed if there was a timeout? So in total, when it fails, it would need to run for about 22 minutes.When RemoteSproc fails the remainder of the local Sprocs still run - even though every Sproc calls has a check for Return Value AND for @@ERRORDescription of process:MasterTransfer calls an Sproc (PullDataFromOracle) on the Remote Server which pulls data from a variety of tables in a yet-more-remote Oracle database into staging tables on RemoteServer.PullDataFromOracle calls a separate Child Sproc for each table to be transferred from Oracle, and one of those has an extra Oracle query if run between 04:00 and 06:59. Each child Sproc logs success to a logging table (hence I think it is possible to deduce that the next-sproc-after-last-one-logged has failed).Then MasterTransfer calls a (local) Sproc (PullDataFromRemoteServer_ToStagingTables) that pulls the data from Remote to Local staging tables - as above, this Sproc calls a separate Child Sproc for each table to be transferred and each Sproc logs its result.Finally MasterTransfer calls an Sproc that processes the (now) local data - as above, this Sproc calls a separate Child Sproc for each table to be transferred and each Sproc logs its result.In case relevant this is what I see in the Logs:Remote:2010-10-20 04:01:08.693 Started2010-10-20 04:06:16.763 Final Sproc logged. Next one would be Big Oracle Query2010-10-20 04:31:08.730 Next batch run startedLocal:2010-10-20 04:02:00.743 Started (clocks not in sync? ). Calls Remote2010-10-20 04:15:56.730 PullDataFromRemoteServer_ToStagingTables startedEdit: Checked the clocks, Remote Server is 50 seconds behind Local ServerWhich suggests that between approx 04:06:16.763 on remote and 04:15:56.730 on local the Big Oracle Query was running, and then terminated for some reason.I could schedule the Xfers on Remote, but it is crucial that they run sequentially with the ones locally, hence I have been calling Remote from Local so they cannot accidentally overlap. |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-10-21 : 07:20:35
|
Current thinking is that the Oracle Driver is fragile and it may be timing out.Any ideas why this would not be being trapped anywhere?PullDataFromOracle Sproc does a:SELECT *INTO #TempTableFROM OPENQUERY(LinkedOracleServer, "SELECT * FROM OracleTable")lets assume that gets a timeout or some sort of network error from the Oracle driver.PullDataFromOracle is is on RemoteServer and is called from the local server:EXEC @intError = RemoteServer.RemoteDatabase.dbo.PullDataFromOracle @intRetVal = @intRetVal_LOCAL OUTPUTIF @intError <> 0 OR @intRetVal_LOCAL <> 0 OR @@ERROR <> 0BEGIN ... Error Handler ... GOTO MyExitLabelENDINSERT INTO LogTable(MyDateTime, MyMessage) SELECT GetDate(), 'PullDataFromOracle completed OK' any ideas why this is not catching any sort of error? and just carrying on?Can I simulate this error so that I can test error handling to find a way to catch the error? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-10-21 : 07:34:06
|
| P.S. Probably ALL these Sproc (and if not ALL then MOST) will haveSET XACT_ABORT ONwhich might be causing/allowing a "Carry On" situation? |
 |
|
|
|
|
|
|
|