Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 Repeated Spontaneous Corruption

Author  Topic 

DanielDucharme
Starting Member

11 Posts

Posted - 2010-04-16 : 14:41:14
We are currently having a large problem in the office where I am system administrator with a database the is repeatedly spontaneously corrupting. This is on SQL Server 2005 hosted on a Windows 2003 Server. This server is hosting 14 databases which do not have problems and then 1 database that keeps corrupting. This is happening if someone logs onto the program we use to access it or not. I have tried talking to the customer support people whose program we are using, but they managed to get the database up an running from the same backup I am using to restore it each time and it did not corrupt in there office.

All 14 databases are accessed by the same program as they are used to store real estate data to do revaluations and this database had worked fine 3 years ago when we lasted used it but the last time I moved it from the town hall to our office the problems began. I have now gone to the town hall and grabbed copies on 3 separate occasions to ensure it was not a bad backup but that hasn't helped.

Every time it has corrupted I have deleted the database entirely, recreated the database and restored from the backup I have. Around 4-7 days later I get the error that allocation unit xxx is found at yyy but belongs to zzz where xxx, yyy, and zzz are different numbers every time. DBCC CHECKDB fails immediately with the same error and says that it cannot be repaired and SQL Server Manager identifies that the database is suspect. I have turned disk caching off on the server as I read online that it could be the cause and yet all that accomplished is to slow down the server as it is still happening. I have also run SQLIOSIM and the only errors had to do with not being able to query the cache (it was turned off at that time).

We need to begin working on this database so any ideas about how this issue could be fixed and what might be causing it would be appreciated. I am a Computer Science Masters student and know a good amount of programming as I am writing a driver for my thesis so don't worry about the code being to complex.

Thank you in advance and I hope this can be resolved quickly.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-17 : 05:04:49
Repeated corruption = faulty IO subsystem. Something, somewhere in the IO stack (driver, cache, controller, disk, etc)

If CheckDB says there's no fix, then there's no fix. Restore from backup and investigate the root cause. I'd suggest that you get it onto a different IO subsystem to start with and investigate from there.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-04-17 : 08:08:05
Might be worth checking what PAGE_VERIFY method the DB is set to. Setting it to CHECKSUM will help cause it to give errors as soon as the corruption occurs, which might enable you to pinpoint the culprit (checking event logs, user moans, etc.)

If the DB is not already set to CHECKSUM then this will only apply to pages that are re-written after you make the change ... so of limited use ... forcing a REINDEX of all table indexes will cause most file pages to be rewritten, and thus have a Checksum added, which will help ...

Increasing the frequency of TLog backups will give you more chance of being able to Restore to point-in-time when it happened - store the Backup files on a different I/O system to the database files for added security.

But as Gail says, changing [i.e. "fixing"] the I/O system or, pro temp, moving the files to a different I/O system should cure the problem. My suggestions are just to try and gain a bit of extra belt-and-braces, not any sort of cure / workaround.
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-17 : 12:31:39
quote:
Originally posted by Kristen

Setting it to CHECKSUM will help cause it to give errors as soon as the corruption occurs


No it won't. The page verify settings are not corruption detection methods, they allow the DB engine to spot corruption when the page is read. There is no method that detects corruption when it occurs.

What page_verify = checksum does is detect any changes made to the page outside of SQL Server. This is because the checksum is the last thing written to a page before it goes to disk and the first thing checked when it's read again. An invalid checksum therefore means that the page was modified while it was on disk.

SQL doesn't periodically check page's checksums for fun. It only checks them when the page is read for some reason (query, checkDB, backup made with the checksum option) and it is only at that point that corruption (if present) will be detected, not when corruption occurs.





--
Gail Shaw
SQL Server MVP
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-04-17 : 14:51:21
Yeah, I realise all that. But I disagree with your conclusions.

You are correct that setting CHECKSUM = VERIFY will not alert the moment that a corruption occurs, and on re-reading my post teh way I described it was imprecise, but IMO setting CHECKSUM = VERIFY is very likely to give early warning that the database is corrupted, due to the very fact that people are using the DB.

For example, most data maintenance routines re-display the record to the user after doing an update - so an UPDATE/INSERT that gets corrupted will immediately report an error on the subsequent SELECT. Also, people are most probably doing a variety of Reports and Queries which read lots of records / pages from the database - particularly if they happen to do a table scan.

Any early warning of data corruption, correctly acted upon, will in turn mean that people don't carry on entering data into a corrupted database (which they then wind up having to repeat).

Its no substitute for fixing the underlying problem, and it may not save your bacon, but there is a fair chance that it will provide an alert earlier than the next DBCC CHECKDB.
Go to Top of Page

DanielDucharme
Starting Member

11 Posts

Posted - 2010-04-17 : 22:15:39
While I agree it sounds like an IO system problem, why is it only this database and none of the 14 other ones. This is not a one time thing either this has happened on over 10 different occasions over the last 2 months and always to the exact same databases while the rest are being used with no problems. This had caused me to rule out hardware and driver issues as I thought that it would have to affect more. As to the CHECKSUM I will look when I am back in the office on Tuesday and see what it is set to and will set it to VERIFY if it is not already.

As for switching it to a different IO subsystem, we only have one server in the office and as it is working fine for all our files and keeping our domain up the boss may be reluctant to have anything changed. What sort of errors should I be looking for in the event log that would point to this kind of problem?

I do have a good backup that I am restoring the corrupted version to, and at this time no one is making any changes but they will be soon which is why it is so important that I get this issue resolved, and if that means swapping out parts from our server then that is fine, I just need to make sure that will fix the problem because the company doesn't have money to through away on potential fixes.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-04-18 : 02:19:44
Is there nothing else you can put the database on short term? If it fails on some other hardware too it would isolate the problem to that database (as per your suspicion).

"What sort of errors should I be looking for in the event log that would point to this kind of problem?"

Hardware errors - particularly Disk / Controller

Are you using Full Recovery Model? (Sounds like you ought to be if the database is updated by multiple people and is thus reasonably mission critical). Check the TLog backups scheduled for no less frequently than every 15 minutes, and are being stored to a different drive (and preferably controller) to the main database - getting them onto a different machine might be better, but I don't have any knowledge of cross machine backups (performance issues etc.).

Schedule a DBCC CHECKDB daily and make sure that any errors are picked up (which probably means manually checking the result log each day, rather than relying on email notification which might fail if the controller fails etc.)
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-18 : 06:55:27
quote:
Originally posted by Kristen... but IMO setting CHECKSUM = VERIFY is very likely to give early warning that the database is corrupted, due to the very fact that people are using the DB.


No, sorry, I disagree.

There are three points at which corruption may be detected on a page.
1) When the page is read initially off disk. Detected by incorrect checksum or invalid torn page bits. Immediately triggers an error.
2)When the page is cracked and the header or footer are invalid. Only an option if (1) did not happen. Any error that damages the page's header or footer will be picked up here. That includes the common case of the IO subsystem zeroing out blocks of the disk, as well as the type that the OP described, pages belonging to the wrong object.
3) When the row is cracked and some of the row header or structure is damaged. Assumes (1) or (2) did not happen
4) Never. A byte or two changed within a data value that doesn't affect the header of row or page. This is the only type that page verify = checksum will detect that otherwise would not be detected.

The checksum will ensure that everything is picked up at step 1, not 2 or 3, and that 4 doesn't happen, but we're not talking hours or days earlier. It's mostly about more reliable detection, and proof that it's happening outside of SQL.

quote:
so an UPDATE/INSERT that gets corrupted will immediately report an error on the subsequent SELECT.

That could only happen if there's memory problems or really, really fast page flushing. If a row is updated and immediately selected, the subsequent read is from cache, not from disk, and checksums are not checked on dirty pages in the buffer pool. The majority of corruption is done to pages that are hardened on disk, not ones involved in active transactions (unless you have memory corruption or a faulty filter driver)

As for active users - there are always portions of the database that are seldom read. Archive tables, seldom used NC indexes, LOB pages, etc. Corruption on those can hide for ages if the pages are never needed.

Checksum is useful because it can detect all forms of damage at the point the page is read in, hence proving that the damage was done outside of SQL.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-18 : 07:04:08
quote:
Originally posted by DanielDucharme

While I agree it sounds like an IO system problem, why is it only this database and none of the 14 other ones.


Size of database, faulty portion of disk, frequency of use, intensity of use, etc. There can be many reasons.

Repeated corruption is one of two things, SQL bug (and from the, admittedly brief, description what you have doesn't sound like a SQL bug), or IO subsystem. There's a very slim chance that it's memory errors, but that's a rare and nasty one, often with more symptoms than just corrupt databases. Hopefully not what you have.

See - http://www.sqlskills.com/BLOGS/PAUL/post/How-to-tell-if-the-IO-subsystem-is-causing-corruptions.aspx for some info on what, why and how.

Could you perhaps post the version of SQL that you're working with (SELECT @@version) and the output of checkDB (run without any of the repair options)


DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

If need be, I'll pass it up to an expert on corruption for analysis.

quote:
As for switching it to a different IO subsystem, we only have one server in the office and as it is working fine for all our files and keeping our domain up the boss may be reluctant to have anything changed.


You're running SQL on a file server/domain controller? There are very good reasons why that's a bad idea, especially if the DBs start getting heavier usage. Security and performance among others. I know it's off topic for this thread, but that's just a f.y.i.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-18 : 09:29:56
Daniel, can you also check the SQL error log? You should have some high severity errors recorded (severity >20), probably error number 823, 824 and/or 825. If you find them, could you post a couple of examples, also of any other high severity errors in the error log.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2010-04-21 : 12:07:22
Interesting discussion between Kristen and Gail - Gail is correct.

The example given about updating a record and then reading it back to the client won't find corruption as the page almost certainly wouldn't have been written out to disk and then flushed from the buffer pool by the lazywriter before the subsequent record read - so there's no checksum checking as the page remains in memory.

Paul S. Randal,
Managing Director, SQLskills.com
Blog: www.SQLskills.com/blogs/paul Twitter: twitter.com/PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of SQL 2005 DBCC CHECKDB/repair code
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-04-21 : 12:51:48
Yup, I've come to that realisation brain going faster than my reality-module

But surely having VERIFY CHECKSUM is more likely to raise an error that there is a corruption "sooner"?

A newly added record has a high probability of being re-read in most of our APPs. For example, adding an item to an order, updates the order header (for the running total). Each item you add updates the order header again, although the Order Header might still be in cache ... but after some delay someone will run the New Orders report ... and there will be a data transfer to the Order Fulfilment system in the next hourly run ... and then I expect we have some poor quality query-plans in the APP such that a table scan is made of the Order Header and Order Items tables. So if there is a corruption during WRITE of that Order Header / Item its pretty likely it will get re-read from disk before the overnight DBCC CHECKDB. Not necessarily true for ALL tables, and it can't be relied on, but I still think the odds are good that VERIFY CHECKSUM will pick up the corruption before the overnight DBCC CHECKDB does.

If that's not the case then why has CHECKSUM option been added? and why use it at all? might as well just rely on the next DBCC CHECKDB and same some CPU time on CHECKSUM maths ...
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-21 : 15:25:50
quote:
Originally posted by Kristen

But surely having VERIFY CHECKSUM is more likely to raise an error that there is a corruption "sooner"?


Sooner than what? The next checkDB? Only if the page is read and, in most cases corruption would throw errors anyway due to damaged torn page bits, damaged header or damaged row structure.
Sooner than torn page protection would? Typically no.

quote:
If that's not the case then why has CHECKSUM option been added?


To *reliably* detect that a page has been modified on disk outside of SQL Server. That's something that torn page couldn't do reliably and, when it comes down to finger pointing between the DBA and the storage/system admin over who's fault it is, a checksum proves (yes) that the damage to the page happened outside of SQL Server.

It's not about how fast the corruption is detected. It's about how reliably it's detected. Without checksum, there are some forms of corruption that can go unnoticed (I alluded to them in earlier post), even by checkDB.

It's not checksum vs checkDB. Checksum is a replacement for torn page protection. That's what you should compare it with.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

DanielDucharme
Starting Member

11 Posts

Posted - 2010-04-21 : 15:27:35
Ok I have checked out the database as asked and here is what I have:

Page Verify had been set to none and is now CHECKSUM

SQL Version is: Microsoft SQL Server 2005 - 9.00.3353.00 (Intel X86) Aug 14 2009 18:07:44 Copyright (c) 1988-2005 Microsoft Corporation Workgroup Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

DBCC CHECKDB is giving the following output although at this moment the database is still working:

Msg 8946, Level 16, State 12, Line 2
Table error: Allocation page (1:461016) has invalid PFS_PAGE page header values. Type is 0. Check type, alloc unit ID and page ID on the page.
Msg 8921, Level 16, State 1, Line 2
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.
Msg 8904, Level 16, State 1, Line 2
Extent (1:382352) in database ID 9 is allocated by more than one allocation object.
Msg 8913, Level 16, State 1, Line 2
Extent (1:382352) is allocated to 'GAM' and at least one other object.
Msg 8904, Level 16, State 1, Line 2
Extent (1:431504) in database ID 9 is allocated by more than one allocation object.
Msg 8913, Level 16, State 1, Line 2
Extent (1:431504) is allocated to 'GAM' and at least one other object.
Msg 8966, Level 16, State 4, Line 2
Unable to read and latch page (1:461016) with latch type SH. UtilDbccVerifyPageId failed.
CHECKDB found 4 allocation errors and 1 consistency errors not associated with any single object.
Msg 8913, Level 16, State 3, Line 2
Extent (1:382352) is allocated to 'sys.sysrowsetcolumns' and at least one other object.
Msg 8913, Level 16, State 3, Line 2
Extent (1:431504) is allocated to 'sys.sysrowsetcolumns' and at least one other object.
Msg 2579, Level 16, State 1, Line 2
Table error: Extent (1:464272) in object ID 4, index ID 1, partition ID 262144, alloc unit ID 262144 (type In-row data) is beyond the range of this database.
Msg 2579, Level 16, State 1, Line 2
Table error: Extent (1:472464) in object ID 4, index ID 1, partition ID 262144, alloc unit ID 262144 (type In-row data) is beyond the range of this database.
Msg 2579, Level 16, State 1, Line 2
Table error: Extent (1:474520) in object ID 4, index ID 1, partition ID 262144, alloc unit ID 262144 (type In-row data) is beyond the range of this database.
CHECKDB found 5 allocation errors and 0 consistency errors in table 'sys.sysrowsetcolumns' (object ID 4).
CHECKDB found 9 allocation errors and 1 consistency errors in database 'NorthProvidence'.


Here are some errors from the log:

2010-04-01 15:48:40.57 spid63 Error: 605, Severity: 21, State: 3.
2010-04-01 15:48:40.57 spid63 Attempt to fetch logical page (1:382352) in database 9 failed. It belongs to allocation unit 72057599604621312 not to 262144.
2010-04-01 15:48:40.70 spid63 Error: 605, Severity: 21, State: 3.
2010-04-01 15:48:40.70 spid63 Attempt to fetch logical page (1:382352) in database 9 failed. It belongs to allocation unit 72057599604621312 not to 262144.

2010-04-01 16:33:20.89 spid60 Using 'dbghelp.dll' version '4.0.5'
2010-04-01 16:33:20.92 spid60 **Dump thread - spid = 60, PSS = 0x1B79D608, EC = 0x1B79D610
2010-04-01 16:33:20.92 spid60 ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0165.txt
2010-04-01 16:33:20.93 spid60 * *******************************************************************************
2010-04-01 16:33:20.93 spid60 *
2010-04-01 16:33:20.93 spid60 * BEGIN STACK DUMP:
2010-04-01 16:33:20.93 spid60 * 04/01/10 16:33:20 spid 60
2010-04-01 16:33:20.93 spid60 *
2010-04-01 16:33:20.93 spid60 * DBCC database corruption
2010-04-01 16:33:20.93 spid60 *
2010-04-01 16:33:20.93 spid60 * Input Buffer 46 bytes -
2010-04-01 16:33:20.93 spid60 * DBCC CHECKDB
2010-04-01 16:33:20.93 spid60 *
2010-04-01 16:33:20.93 spid60 * *******************************************************************************
2010-04-01 16:33:20.93 spid60 * -------------------------------------------------------------------------------
2010-04-01 16:33:20.93 spid60 * Short Stack Dump
2010-04-01 16:33:21.01 spid60 Stack Signature for the dump is 0x00000139
2010-04-01 16:33:27.20 spid60 External dump process return code 0x20000001.
External dump process returned no errors.
2010-04-01 16:33:27.20 spid60 DoMiniDump () encountered error (0x800703E6) - Invalid access to memory location.

2010-04-01 16:33:27.28 spid60 DBCC CHECKDB (NorthProvidence) executed by apro terminated abnormally due to error state 2. Elapsed time: 0 hours 0 minutes 8 seconds.

2010-04-07 01:56:48.04 spid56 Error: 605, Severity: 21, State: 3.
2010-04-07 01:56:48.04 spid56 Attempt to fetch logical page (1:382355) in database 9 failed. It belongs to allocation unit 72057599602982912 not to 262144.
2010-04-07 01:56:48.34 spid56 Error: 605, Severity: 21, State: 3.
2010-04-07 01:56:48.34 spid56 Attempt to fetch logical page (1:382355) in database 9 failed. It belongs to allocation unit 72057599602982912 not to 262144.
2010-04-07 01:56:48.40 spid56 Error: 3314, Severity: 21, State: 4.
2010-04-07 01:56:48.40 spid56 During undoing of a logged operation in database 'NorthProvidence', an error occurred at log record ID (33672:11347:50). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2010-04-07 01:56:48.49 spid56 Error: 9001, Severity: 21, State: 1.
2010-04-07 01:56:48.49 spid56 The log for database 'NorthProvidence' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
2010-04-07 01:56:48.49 spid56 Error: 3314, Severity: 21, State: 5.
2010-04-07 01:56:48.49 spid56 During undoing of a logged operation in database 'NorthProvidence', an error occurred at log record ID (33672:11347:1). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2010-04-07 01:56:49.64 spid20s SQL Server has encountered 2 occurrence(s) of cachestore flush for the 'Object Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2010-04-07 01:56:49.65 spid20s SQL Server has encountered 2 occurrence(s) of cachestore flush for the 'SQL Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2010-04-07 01:56:49.65 spid20s SQL Server has encountered 2 occurrence(s) of cachestore flush for the 'Bound Trees' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2010-04-07 01:56:50.07 spid20s Starting up database 'NorthProvidence'.
2010-04-07 01:56:53.48 spid20s Analysis of database 'NorthProvidence' (9) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.
2010-04-07 01:56:54.46 spid20s 3 transactions rolled forward in database 'NorthProvidence' (9). This is an informational message only. No user action is required.
2010-04-07 01:56:54.73 spid20s Error: 605, Severity: 21, State: 3.
2010-04-07 01:56:54.73 spid20s Attempt to fetch logical page (1:382355) in database 9 failed. It belongs to allocation unit 72057599602982912 not to 262144.
2010-04-07 01:56:54.74 spid20s Error: 3314, Severity: 21, State: 1.
2010-04-07 01:56:54.74 spid20s During undoing of a logged operation in database 'NorthProvidence', an error occurred at log record ID (33672:11347:50). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2010-04-07 01:56:54.74 spid20s Error: 3414, Severity: 21, State: 2.
2010-04-07 01:56:54.74 spid20s An error occurred during recovery, preventing the database 'NorthProvidence' (database ID 9) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

2010-04-07 16:42:59.40 spid54 Starting up database 'NorthProvidence'.
2010-04-07 16:43:01.23 spid54 Analysis of database 'NorthProvidence' (9) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.
2010-04-07 16:43:01.60 spid54 3 transactions rolled forward in database 'NorthProvidence' (9). This is an informational message only. No user action is required.
2010-04-07 16:43:01.78 spid54 Error: 605, Severity: 21, State: 3.
2010-04-07 16:43:01.78 spid54 Attempt to fetch logical page (1:382355) in database 9 failed. It belongs to allocation unit 72057599602982912 not to 262144.
2010-04-07 16:43:01.79 spid54 Error: 3314, Severity: 21, State: 1.
2010-04-07 16:43:01.79 spid54 During undoing of a logged operation in database 'NorthProvidence', an error occurred at log record ID (33672:11347:50). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2010-04-07 16:43:01.81 spid54 Error: 3414, Severity: 21, State: 2.
2010-04-07 16:43:01.81 spid54 An error occurred during recovery, preventing the database 'NorthProvidence' (database ID 9) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

Thanks for all the help and if you need any other information let me know
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-21 : 15:33:04
Ouch. Not much I really need to say there. That's irreparable damage to system tables and allocation structures, SQL's hitting that while recovering the database and taking the DB suspect
Doesn't look like a SQL bug (at least not to me). I would still say, after looking at this, it's IO subsystem.

Paul may have further input.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

DanielDucharme
Starting Member

11 Posts

Posted - 2010-04-23 : 13:13:57
Ok today I tried to log onto the database and wouldn't you know it has corrupted. Running DBCC CHECKDB resulted with the following error:

Msg 7984, Level 16, State 1, Line 1
System table pre-checks: Object ID 4. Page (1:382353) has unexpected page type 2. Check statement terminated due to unrepairable error.

As for SQLIOSIM I had it run 100 times and the only errors were that it could not query the disk cache which is turned off so that is fine. I will be running a hardware stress test today to see if I can get any of the hardware to fail and will let you know if I find anything, thanks for all of the help thus far.
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-04-23 : 14:30:05
Can you move the database onto other disks? Even local disks? See if the problems continue after doing so.

Can you also get the server admin to check and update firmware and drivers for the storage subsystem?

Also, can you apply SP3 to SQL Server? That looks like one of the SP2+CU versions.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-04-23 : 14:31:19
I've edited Daniel's posts to remove the code tags as it was making this topic very, very wide. Normally I would just add carriage returns to the data between the code tags, but the data was too long and I just don't have the time. Removing the code tags doesn't appear to make the data less readable in this case, so it's a happy medium (for me at least, haha!)

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

DanielDucharme
Starting Member

11 Posts

Posted - 2010-04-23 : 14:56:02
Sorry about the code tags, I thought that is how it should be formatted and will leave them out from now on in this topic.

Sadly there is only one set of RAID 1 disks on the server and while there are 2 partitions on the disks the system partition only has 900MB free and the databases are currently on the non-system partition.

I am also the server admin so I will check on those firmware and drivers and will apply SP3 and see if it helps at all. Thanks
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2010-04-23 : 19:25:50
Your IO subsystem is badly corrupting your databases - you've got transaction log corruption, system catalog corruption, allocation bitmap corruption.

Even though the database might appear usable, these corruptions are not repairable through any means and who knows what state your actual data is in. You need to get the database off these drives ASAP or you're in danger of losing everything.

Thanks

Paul S. Randal,
Managing Director, SQLskills.com
Blog: www.SQLskills.com/blogs/paul Twitter: twitter.com/PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of SQL 2005 DBCC CHECKDB/repair code
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-04-23 : 20:38:44
Daniel, make sure you look at the very last line in Paul's signature. He is the author of DBCC CHECKDB, so he is the ultimate expert on this subject.

Plus he's a pretty cool dude too. (I'm just buttering him up.)

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page
    Next Page

- Advertisement -