SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 dbcc errors
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

bstubler
Starting Member

USA
4 Posts

Posted - 09/07/2006 :  18:15:45  Show Profile  Reply with Quote
I'm running MS SQL 2000 sp3 under 2000 server sp4. I lost a hard drive in a RAID5 and afterwords my weekly maintenance (rebuild and check links) errored five days later. I'm on a weekly backup cycle and running Great Pains Dynamics 6.0. Can't go back now.
Here's what dbcc checkdb found on my 5 gig db.

DBCC results for 'UPR41400'.
There are 5 rows in 1 pages for object 'UPR41400'.
Server: Msg 8928, Level 16, State 1, Line 1
Object ID 171147655, index ID 0: Page (1:192881) could not be processed. See other errors for details.
Server: Msg 8944, Level 16, State 1, Line 1
Table error: Object ID 171147655, index ID 0, page (1:192881), row 56. Test (ColumnOffsets <= (nextRec - pRec)) failed. Values are 4203 and 111.

I'm keeping a warm backup server up to date so I can run whatever you suggest on it without harming my production db further. I'm monitoring the db daily and after seeing some other posts, I realize i'm very lucky this is the only error. At the end of the messages it informs me that 'repair_with_allow_data_loss' is my first option. The strange thing is I can return all rows on the table without error. It's a tax table.
Do I have any other options?
Thanks


paulrandal
Yak with Vast SQL Skills

USA
899 Posts

Posted - 09/07/2006 :  18:49:00  Show Profile  Visit paulrandal's Homepage  Reply with Quote
There's a corrupt record in the middle of page (1:192881) - one of the pointers in the record points off the end of the record. The repair will deallocate the entire page, deleting all the records on it. Yes, you're very lucky. I'm guessing you can return all the rows because the schema is such that the offset wasn't needed to crack the record contents.

Did you run a DBCC CHECKDB after you lost the drive?

Can you rebuild the data in this table? I'm guessing yes from your description. If so, that would be my first choice.

Do you do any other backups apart from a full backup weekly? If not, you're opening yourself up to losing a whole week's worth of work (if a disaster occurs just before your weekly backup).

Thanks

Paul Randal
Lead Program Manager, Microsoft SQL Server Core Storage Engine (Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

Edited by - paulrandal on 09/07/2006 18:49:42
Go to Top of Page

bstubler
Starting Member

USA
4 Posts

Posted - 09/07/2006 :  19:53:05  Show Profile  Reply with Quote
Thank you Paul for your quick responce.
My backup scheme is full back nightly to DLT tape,5 tape rotation, plus translogs every 2 hours during the day. I update the warm standby daily.
I know very little SQL, just enough to get in trouble all self taught.
How do I rebuild the table? If it's too detailed just the thumbnail and I'll research it.
I did the dbcc checkdb after the drive failure and RAID rebuild. I now realize i need to run it on the master and Dynamics dbs. Found the procedure to checkdb via Google.
Thank you
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

USA
899 Posts

Posted - 09/07/2006 :  21:05:19  Show Profile  Visit paulrandal's Homepage  Reply with Quote
What I mean by rebuild is, do you have the data in the table stored somewhere else? I get the impression its a table with static tax data in - in which case you could drop the table and recreate it with no loss of data.


Paul Randal
Lead Program Manager, Microsoft SQL Server Core Storage Engine (Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 09/08/2006 :  04:41:39  Show Profile  Reply with Quote
"I know very little SQL, just enough to get in trouble all self taught."

You know what your limitations are ... that's very refreshing to hear!

Do you have ALL the Transaction Log backups since the disk error? If not skip this suggestion:

You could try recovering (to a NEW test database) the FULL backup from immediately before the Disk Error, and then every TLog backup since (take one final TLog backup before you start!). Seeing as your error is in the MDF file there is a very good chance that it will not be duplicated in the TLog files - particularly if they are on a different disk channel. See below for suggestions on checking what is missing etc.

Otherwise:

I would suggest you restore to as-near-to the disk failure as you can to a NEW database. (Hopefuly obvious but do NOT overwrite any existing database!) Call that database OLD

You also need TWO restored copies of the current database (again, to new database names). - Lets call them REPAIRED and CURRENT.

You will need to repeat this process for real, later on, with the live database "down" [from the users perspective], so worth writing up every step you do to make it easy, and reliable, to repeat later.

You should be running this on a QA server - your warm standby sounds like a good candidate, but this Dress Rehearsal could be done on a workstation/laptop or whatever (disk space permitting).

Run the DBCC on the REPAIRED database until it doesn't give any more errors. Use whatever data-loss commands DBCC recommends, or you feel the need for!

Then check what got deleted by comparing REPAIRED and CURRENT. This will fail if the data corruption is in a critical section, but sounds like that is not the case for your database. You might want to try Red Gate's SQL Compare for this task (they have a trial period I think)

Or use:

SELECT MyPK, ... other useful columns ...
-- INSERT INTO REPAIRED.dbo.MyTable SELECT *
FROM CURRENT.dbo.MyTable AS C
     LEFT OUTER JOIN REPAIRED.dbo.MyTable AS R
         ON R.MyPK = C.MyPK
WHERE R.MyPK IS NULL

this will show you records on CURRENT that no longer exist in REPAIRED. You can reinstate these by commenting-in the INSERT statement.

I don't know if you need to do a column-by-column test, but I suspect not (my understanding is that DBCC will throw away whole rows where there is a corruption to the data page)

Then you should repeat this with the OLD database - you are looking for records which used to exist in the OLD database, but no longer exist in the REPAIRED database (presumably there will be lots of Changed and New records in the REPAIRED database). This may be more tricky because there may be legit reasons why some rows no longer exist (deleted by user / application etc.) and the data on those rows will be stale, and may breach referential integrity (e.g. references a customer who's account number has changed, or somesuch), so re-instating them may have its own problems. But again, getting a list like above and deciding what needs to be reinstated will allow you to "rebuild the data in this table".

Another table-rebuild strategy would be to BCP out the data from that table [use the CURRENT database, not the REPAIRED one] (using BCP's Native mode to preserve the data as-is) and then truncate the table and re-import it [into the REPAIRED database] and compare. (If the truncate gives you referential integrity errors you will have to DROP the respective Foreign Keys, and recreate them after the re-import)

After you've done your Dress Rehearsal the live performance can begin!

Disconnect all users (one way would be to set the database to DBO only (but check that works, some application(s) may be connecting as SA [which is in itself something you would need to address, very lax for security!]) failing that set the database to single user (but that will hamper you ability to work with it). Failing that take the database off-line and work on a copy.

AFTER its not available to users take a FULL backup.

Restore the FULL backup to another machine (to a temporary database)

Run:

DBCC CHECKDB('MyDatabaseName') WITH NO_INFOMSGS

on BOTH the original and the restored copy. Check that the output is identical. This proves that you CAN restore from the backup file you just made, I know you will be tight for time at this point, but no point having a backup that cannot be restored successfully on a different machine if needs be.

Repeat the steps from your dress rehearsal on the Live database (or a copy of it if you have to take it offline)

Then put the database back online (or Backup and Restore it if you had to work on a copy elsewhere - and run another DBCC CHECKDB to check its OK.)

Couple of other observations:

Build a DBCC CHECKDB into your maintenance processes - daily if it doesn't take too long. Use your existing, broken, database to double check that the alert the new test will raise does actually get seen!.

I would increase your TLog backups to every 10~15 minutes. You will backup the same volume of data in a day, there will just be more files, but when you get to know something has broken the potential data-lost will be less. Those Tlog backups should be immediately copied to another machine (in case the master server blows up!) - or use Log Shipping to get the data onto a warm standby server.

Perhaps double check the alert when a RAID drive fails - you need to get to hear about that promptly so you can run a DBCC CHECKDB at the earliest moment. (We had RAID5 drive fail that trashed the database, until then I thought RAID gave me the necessary redundancy )

Some suggestions (you may already be doing, or could plan to do)

Use RAID10 rather than RAID5.

Put the MDF and LDF files on different disk channels.

Put the Backups on yet-another-channel

If backups cannot be on their own channel then put backups on the LDF channel (data corruption to MDF file should still be recoverable by: Disable database, take final TLog backup, restore last FULL and all subsequent Tlog backups)

Kristen

Edited by - Kristen on 09/08/2006 04:42:39
Go to Top of Page

bstubler
Starting Member

USA
4 Posts

Posted - 09/08/2006 :  16:56:26  Show Profile  Reply with Quote
Thank you Paul and Kristen, you've given me alot to work with. Drop and recreate the bad table on my backup server and run dbcc checkdb. If it's repaired do it on the production server. I'll do that after hours in single user mode.
If not run 'dbcc repair allow data loss' on the backup and compare with the production. Then take what action seems appro after seeing what was deleted.
I'll let you know how it worked out.
If anyone else wants to chime in I'll filter it into the pot.
Thanks again
Bob Stubler
Network Admin
Go to Top of Page

bstubler
Starting Member

USA
4 Posts

Posted - 09/11/2006 :  15:46:48  Show Profile  Reply with Quote
Paul and Kristen,
I did a 'check db (repair_rebuild)' on my last restore to my warm server as soon as I got back to the shop and it cleared all errors on my next 'check db'. My errant table still contains it's data also. I'll perform on production db tonight after hours. I'm going to read through all the postings on this forum and try to learn this stuff.
Thank you
Bob
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.06 seconds. Powered By: Snitz Forums 2000