Serious backup problem - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Serious backup problem

Author

Topic

mpreissner
Starting Member

21 Posts

Posted - 2011-02-24 : 08:39:52

I'm managing a SQL 2005 database that holds CDR data for a large telecomm project. My database is currently about 120 GB and backups are taking FOREVER (to the tune of 8 hours+). We are using NeverFail to handle replication/failover.

The hardware is a little on the slow side in my opionion, running 3x 15k drives in a RAID 5 config (not my choice). To compare, another database I manage is successfully backing up 145 GB in 40 minutes, with little difference in the hardware.

Can anyone help me figure out what is slowing down the backups so much? CPU and memory utilization are well within a tolerable range (CPU rarely goes over 20%, mem is about 75%). Thanks in advance!

mpreissner
Starting Member

21 Posts

Posted - 2011-02-24 : 09:16:45

oops...just realized I probably should've posted this in Administration...

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2011-02-24 : 09:51:34

Do you have a lot of locks on the database while the backup is running. Do you have split pages or very wide rows?

What other maintenance tasks are you performing on the database/server?

mpreissner
Starting Member

21 Posts

Posted - 2011-02-24 : 10:04:58

Good questions...I'm coming into this a little late in the game...I didn't know anything about SQL two years ago, and I've had no formal training...

This database had no maintenance tasks set up on it when it was given to me. I have since set up a weekly CHECKDB, SHRINK, as well as nightly full backups, and hourly TLog backups. I haven't had much chance to really dig into it. The only thing I did notice is that the database is is structured as an .mdf, .ldf, and three .ndf files. I'm not sure if you're familiar with ISI, but it's their database product for Nortel CDR data...not sure if that helps at all.

I know how to check for locks...I'll get that data as soon as I can get back to the server, but how do I check for split pages?

mpreissner
Starting Member

21 Posts

Posted - 2011-02-24 : 15:17:02

Ok, locks/sec hovers around 20, but spikes up to about 250 or more every few seconds...same with transactions/sec. Current backup throughput is averaging about 5 MB/sec while backing up to local disks. HDTune utility shows the array is capable of over 100 MB/sec, so it doesn't appear to be a hardware issue.

Yeoh Ray Mond
Starting Member

49 Posts

Posted - 2011-02-28 : 12:21:06

Is there a reason you need to shrink the database on a weekly basis? See here on why shrinking (especially data files) is a bad idea. May also result in heavily fragmented files on disk if the file needs to constantly shrink and grow.

You may also want to measure just the read throughput during a backup, which then allows you to compute the write throughput. See here on how to do that.

Ray Mond

RESTORE VERIFYONLY - don't bet your job on it! Learn why here.

yadhu_cse
Constraint Violating Yak Guru

252 Posts

Posted - 2011-03-01 : 01:30:18

use row partition

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2011-03-01 : 05:41:14

quote:
Originally posted by yadhu_cse

use row partition

For what exactly? I think the OP has bigger problems than this, so random word generation does not help.

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 08:57:09

The only reason I've incorporated a weekly shrink was because that was what was set up by the "experts" on the first database that I was given to manage. I'm in no way an expert when it comes to SQL; everything I know, I've learned on the job - no study materials, no classes. If you think it advisable, I can discontinue the shrink (it barely appears to be having any effect anyway - this DB doesn't appear to grow very fast).

Read and write throuput during a backup sit around 3-5 MB/sec, with spikes up to around 8 MB/sec. The drives hosting the database appear to be heavily fragmented. I'm running a DBCC Showcontig now to see what kind of shape the indexes are in.

NeilG
Aged Yak Warrior

530 Posts

Posted - 2011-03-01 : 09:42:55

quote:
Originally posted by mpreissner

was set up by the "experts" on the first database that I was given to manage.

This could be a cause of the problem, so I'd ignore your experts and try to get advice from different sources, or trust your own judgement more.

Has there been any re-indexing etc type maintenance carried out to reduce fragmentation, this could be causing the indexes to take longer to backup happen if they're badly fragmented.

-----------------------------------------------
Learning something new on SQL Server everyday.

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 10:26:06

There was a reindexing job that our "experts" had us put in place. It would run every Sunday, but was taking 26+ hours to complete, and interfered with data processing, so we elected to stop it since we never saw any improvement in performance once the reindexing was complete. The T-SQL statement they gave us did a blanket rebuild of all indices using a fill factor of 90.

So here are the results of the fragmentation check...

Returned 433 objects.
115 objects showed fragmentation greater than 0%.
85 of those objects were fragmented more than 30%.
31 objects were fragmented more than 90%.

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2011-03-01 : 11:01:51

Those 31 objects are your main write tables I guess..

You may want to check out Tara's isp_DBCC_INDEXDEFRAG script..

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2011-03-01 : 11:02:28

Oh, and it wouldn't hurt to have a look at the isp_UPDATE_STATISTICS while you are at it.

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 12:20:46

I'll take a look at those...

In the meantime, I found a script on the msdn website that allows you to automate the rebuilding or reorganizing of indices based on their level of fragmentation. http://msdn.microsoft.com/en-us/library/ms188917.aspx is the link - section D under Examples.

It seems to me that I would benefit from using a much lower fill factor than what the "experts" suggested. It only took a week and a half since the last index rebuild for our heavy hitting tables to go over 90% fragmentation, so I'm thinking I should probably set the fill much lower when I re-implement a reindexing, and then do a reorganize (defrag index) more often to keep it from getting out of hand...thoughts?

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2011-03-01 : 12:55:01

There seem to be many issues with your database, but nothing really relevant to why SQL Server would take a long time to backup. Bad index fragmentation would not cause this because backups simply deal with physical database pages, not indexes.

You should probably look for physical IO issues first to make sure you don’t have problems there. I would start by testing all the database drives and backup drives to make sure they can really support high IO rates. I recommend testing with SQLIO using large test file sizes, like 10 GB, to verify that the good IO rate you are seeing is not a result of caching on the array controller.

CODO ERGO SUM

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 13:01:40

Unfortunately, my database drives and backup drives are one and the same...my company seriously skimped when it came to hardware. However, another database I work with is 20GB larger, and backs up in 40 minutes on similar hardware, versus the 8+ hours of the one I'm trying to diagnose...

I'll give SQLIO a shot and see what it comes up with.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-03-01 : 13:12:35

Use isp_ALTER_INDEX rather than isp_DBCC_INDEXDEFRAG: http://weblogs.sqlteam.com/tarad/archive/2009/11/03/DefragmentingRebuilding-Indexes-in-SQL-Server-2005-and-2008Again.aspx

Have you performance tested your fillfactor changes? We saw major performance degradation on reads when we switched from 0/100 to ANYTHING else.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 13:23:59

I haven't made any changes yet...I only have the production system to work with, as our design team doesn't let anyone use the lab. Our system is for a gov't contract, and requires 24x7 availability, so I don't like to make any big changes during the work week...

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-03-01 : 13:30:25

What does PerfMon show for the following two counters for the drives where the backups go? What does it show when the backups are running and when they aren't running?

LogicalDisk\Avg. Disk sec/Read
LogicalDisk\Avg. Disk sec/Write

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 13:41:21

Tara -

I'll get the perfmon counter info tomorrow morning as the backup will still be running by the time I get into the office. Just as a point of reference, using SQLIO doing random reads, the Avg Disk sec/Read is staying mostly between .050 and .085, but spikes up to .115 or so.

SQLIO tests using 10 GB file, sequential reads and writes show a throughput of about 172 MB/sec for reads and 170 MB/Sec for writes. I'm running random read/write tests now to compare.

mpreissner
Starting Member

21 Posts

Posted - 2011-03-01 : 13:46:18

ok...random read throughput from SQLIO using the same 10GB file was only about 5 MB/sec, which is the throughput I'm getting on my backups...

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources