Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 to hell in a handbasket

Author  Topic 

martinn
Starting Member

2 Posts

Posted - 2007-07-09 : 21:28:33
I have several smallish databases running on an MPC (www.mpccorp.com) server. Device Manager says it has an LSI Logic 1020/1030 Ultra320 SCSI Adapter and a MegaRAID SATA 150-6 RAID controller. It doesn't have any kind of Windows-accessible RAID management interface.

Several months ago I started getting corrupt databases. They would get errors that a DBCC CHECKDB couldn't fix. I never found specific help on this but most of the similar issues I saw pointed toward the RAID controller. We contacted the MPC, who had updated RAID firmware for us to try. We flashed the RAID card reformatted the disks, and restored everything from the last good backup (it had been throwing errors for a couple weeks before I noticed them).

All was good for about a month, but now I'm back to the same situation. I have several corrupt databases. I have good backups, but can't even restore them because I get errors on the restore. My next step is to pay for an incident with Microsoft, but I suspect they'll just point me back to the hardware. If you have any suggestions for problem determination or resolution, I'd sure appreciate them!

Cheers,
Martin Nickel

Sample corruption error:
SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x0; actual signature: 0x3f380c2c). It occurred during a read of page (1:9) in database ID 9 at offset 0x00000000012000 in file 'E:\Program Files\Microsoft SQL Server\MSSQL\Data\MyDB.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information<c/> see SQL Server Books Online.

Sample error during DBCC CHECKDB:
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9156028125792763904 (type Unknown), page (34262:2139451659). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 29362185 and -4.
Repairing this error requires other errors to be corrected first.

Sample database restore error:
Msg 3283, Level 16, State 1, Line 1
The file "MyDB_log" failed to initialize correctly. Examine the error logs for more details.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2007-07-10 : 13:27:53
Hi Martin,

A torn-page is caused by hw problems - something caused the drive not to write all of a page to disk. I recommend you turn on page checksums, which provide a finer granularity of hw-error detection. Are you having any power issues?

Also, checkout the video of my TechEd presentation which details a bunch of ways to detect and troubleshoot corruptions - http://blogs.msdn.com/sqlserverstorageengine/archive/2007/06/27/teched-session-video-available.aspx

Thanks

Paul Randal
Principal Lead Program Manager, Microsoft SQL Server Core Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
http://blogs.msdn.com/sqlserverstorageengine/default.aspx
Go to Top of Page

martinn
Starting Member

2 Posts

Posted - 2007-07-10 : 16:08:20
Thanks Paul for your reply,

I turned torn page detection on and get things like this:

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x55555555; actual signature: 0xea801ab9). It occurred during a read of page (1:23962) in database ID 8 at offset 0x0000000bb34000 in file 'E:\Program Files\Microsoft SQL Server\MSSQL\DATA\ZSDB.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

We're treating it as a hardware problem and have contacted MPC (the hardware vendor) for support.

All the best to you,
Martin Nickel
Go to Top of Page
   

- Advertisement -