Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 Log shipping: log bkup file corrupt during copy(?)

Author  Topic 

beadlesm
Starting Member

5 Posts

Posted - 2007-03-06 : 23:10:35
Interesting problem here - seeking comments:

The setup:
Log ship from source to two destinations.

The issue:
Destination One log shipping never fails.
Destination Two log shipping fails every two to four days - sometimes more often.

The temporary fix:
Re-copying the tran log backup file that caused log shipping to fail gets Destination Two log shipping running again, until it hits another "problem" tran log backup file.

Failure indications in the log:
I see these messages in the copy/restore history on Destinatino Two:

[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3624: [Microsoft][ODBC SQL Server Driver][SQL Server]
[Microsoft][ODBC SQL Server Driver][SQL Server]Location: page.cpp:2787
Expression: slot < m_slotCnt
SPID: 64
Process ID: 4172

[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 7987: [Microsoft][ODBC SQL Server Driver][SQL Server]
A possible database consistency problem has been detected on database 'MyDatabase'.
DBCC CHECKDB and DBCC CHECKCATALOG should be run on database 'MyDatabase'.


[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3456: [Microsoft][ODBC SQL Server Driver][SQL Server]
Could not redo log record (991099:13406:9), for transaction ID (0:0), on page (3:1180848), database 'MyDatabase'(5).
Page: LSN = (991099:8410:5), type = 11. Log: OpCode = 7, context 11, PrevPageLSN: (5185403:8410:5).


All three servers are 2000 EE, SP4 + AWE fix (.2040)

My guess is that there is some file corruption happening during the copy task only to Destination Two.

Destination Two has been completely re-installed from the ground up - hardware is the same though. Source and Destination 1 are on Gigabit, Destination 2 is 100 full. Wondering if SAN drivers could be playing a part in this too.

Source has all normal maintenance being done, and DBCC Checkdb maint jobs do not report any errors (otherwise, wouldn't Destination 1 log shipping fail as well?).

Interestingly, a colleague having similar issues on a server in that same server room finally stopped using it, and brought up a new server in another city for his destination, and no problems.

Anyone see anything like this?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-03-07 : 02:15:53
Did you run DBCC CHECKDB and DBCC CHECKCATALOG on MyDatabase in Query Analyzer?

If you were to restore the database on the primary server (using a new name so that you don't impact your current database) using the same files that are failing at the secondary server, are you able to successfully restore them?

Tara Kizer
Go to Top of Page

beadlesm
Starting Member

5 Posts

Posted - 2007-03-07 : 16:45:49
Regarding running DBCC Checkdb on the destination db:

No, have not run DBCC Checkdb, since I am running the db in NoRecovery log shipping mode. (I just switched it to Standby / disconnect users mode today, just to see what happens).

The other reasons I haven't tried that yet (as mentioned above):

1) The same file is shipped to another LS destination server in a different state (on a very slow network pipe, btw), which has never produced any of these errors.

2) A simple re-copying of the file to this problematic destination server always clears the problem, and log shipping continues fine, for a while. The copy can come from the source or the other destination server in the different state - either way clears the problem).

We have a call into our MS support rep, and I will let the forum know what the rep advises. But any other comments are welcome. Thanks tkizer - will certainly keep this in mind as something to try.
Go to Top of Page

rlaubert
Yak Posting Veteran

96 Posts

Posted - 2007-03-08 : 10:41:37
I can think of two issues you may be experiencing.
One is a network communication issue which is corrupting the file. This is unlikely but I have had a client that had an industrial mixer that was causing periodic electronic interference on the network and corrupting all communications 4 times a day.
The second is a bad sector on the hard drive. When the file is copied and verified the data is corrupted and therefore generates an error. How old are the drives in this system? If they are more than a year or two, you may want to replace them and see if that doesn't clear the problem.

Raymond Laubert
MCDBA, MCITP:Administration, MCT
Go to Top of Page

beadlesm
Starting Member

5 Posts

Posted - 2007-03-11 : 15:22:04
Thanks Raymond,

I have the same suscpiscions (perhaps when I'm blending my Margerita in my cubicle, the same thing happens?)

All kidding aside - you are right - I wrote a batch file to compare the MD5 Hash on the tran log backup file after it reaches the destination, comparing it to the MD5 Hash on the same file on the source. Using MS Logparser (to trap the latest file names on the destination server) and MS fciv.exe (md5 program)

It verifies that the hashes match when logshipping runs ok and that the hashes are different, coinciding (of course) when log shipping breaks. I now have the Network admin and SAN admin groups looking into this. MS rep is betting on the SAN. Will advise the forum as to final outcome.
Go to Top of Page

beadlesm
Starting Member

5 Posts

Posted - 2007-03-21 : 15:03:59
Problem is fixed by Network admins changing Destination 2 over to Gigabit (so that it matches Source and Destination 1).


For our records, the corrective actions were:


1) Complete rebuild and driver install of Destination 2 SQL (they kept the same SAN config, though). The problems continued.

2) Moving Destination 2 SQL network transport from 100 FULL to Gigabit so that it matched Source SQL and Destination 1 SQL. The problems stopped.



The problem indications were:

Log shipping works for several hours or several days, then fails with any of the following errors:

MD5 Hash of the copied tran log backup file on Destination 2 does not match with source. This was discovered after the errors showed up in the SQL Error Log (see below).


Error: 3314, Severity: 21, State: 3
Error while undoing logged operation in database 'MY_DB'. Error at log record ID (992897:96594:277).

[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 4323: [Microsoft][ODBC SQL Server Driver][SQL Server]
The database is marked suspect. Transaction logs cannot be restored. Use RESTORE DATABASE to recover the database.
[Microsoft][ODBC SQL Server Driver][SQL Server]RESTORE LOG is terminating abnormally.


[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3624: [Microsoft][ODBC SQL Server Driver][SQL Server]
[Microsoft][ODBC SQL Server Driver][SQL Server]Location: page.cpp:2787
Expression: slot < m_slotCnt
SPID: 64
Process ID: 4172

[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 7987: [Microsoft][ODBC SQL Server Driver][SQL Server]
A possible database consistency problem has been detected on database 'MY_DB'.
DBCC CHECKDB and DBCC CHECKCATALOG should be run on database 'MY_DB'.


[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3456: [Microsoft][ODBC SQL Server Driver][SQL Server]
Could not redo log record (991099:13406:9), for transaction ID (0:0), on page (3:1180848), database 'MY_DB'(5).
Page: LSN = (991099:8410:5), type = 11. Log: OpCode = 7, context 11, PrevPageLSN: (5185403:8410:5).

Go to Top of Page
   

- Advertisement -