Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 More duff RAID5

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2006-12-08 : 04:31:00
Is RAID5 worth using at all?

You may remember that 6 months, or so, ago we had a RAID5 fail during a write. The controller decided to abort the write (rather than continue the write to the remaining good drives), which in turn meant that SQL Server declared a Torn Page [because ITS write was multi-sector, and they were not collectively complete].

I can;t understand why the O/S didn't "retry" in this instance - which would have fixed the problem (i.e. by that time the bad drive would have been locked out).

But having read up extensively on the matter it seems that that is pretty normal.

Yesterday we had a RAID5 drive fail on a server - which is used for all sorts of DEV and FileShare stuff.

We banged a new drive in and it started rebuilding and then bombed out setting both the other drives to fail. We then had to rebuild the RAID from a Bios reboot - which took 6 hours and, of course, the server was offline.

Doesn't seem to do what it says on the tin, or be worth the paper it is written on.

What are other peoples experiences of drive failures in RAIDs?

And:

1) I think we should have a hot-swap standby in the drive cage

2) Would using RAID10 exclusively, in your experience, give us a better redundancy state?

Cheers,

Kristen

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-12-08 : 11:26:47
I haven't seen any problem from failures of drives in a RAID 5 array, but the majority of our storeage is on SAN, so we have limited exposure, mostly on dev systems.

The problems I have seen is from failure of the RAID controller. That always seems to wipe out the whole array. Maybe our hardware guys just don't know how to replace one without trashing the array. I'm not sure, and since I don't deal with hardware directly, I haven't looked into it.





CODO ERGO SUM
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-08 : 11:33:14
But you reckon you've had enough drive failures that you would have know about it if they had been a problem?

Or have you just been lucky and had no failures?

I'm ignorant on SAN - is it inherently more reliable?

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-12-08 : 12:33:03
We have experienced no data loss dues to hardware problems of data stored on the SAN in the last 5 years. We have about 12 TB of SAN storage with 20 attached servers.

The SAN system is a box with a large number of drives (200 or so on our system) in it, usually in RAID 5 or 10. Space on each drive is divided into about 8 MB units. The basic units are assembled into metas that are presented to the server as drives.

There are a number of advantages:
1. Space can be added to a server at any time if it is available, and space can be allocated in the necessary drive size.
2. Virtual drives can be moved from a failed server to a new server fairly quickly.
3. The SAN system has 32 GB of cache that is shared across all systems, so access is very fast, especially for writes, compared to a local disk array. The SAN has its own built in UPS to allow all data to be cleared to disk if there is a loss of external power.
4. The vendor monitors the SAN system 24x7, and sends someone immediately with a new component if one starts reporting problems.
5. Usually, a system will have dual controllers connected to different switches for increased performance and for redundancy.

Disadvantages:
1. Cost
2. If it is necessary to shut the SAN system down, all the attached servers are down.




CODO ERGO SUM
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-08 : 13:21:34
Advantages:

1) Its client's money
2) I'd prefer an easier life
3) The client will suffer less downtime

Disadvantages

1) With no downtime the client will think no though went into it

Kristen
Go to Top of Page

thecoffeeguy
Yak Posting Veteran

98 Posts

Posted - 2006-12-08 : 13:25:00
RAID 5 is such a fickle thing. I have had some nasty experiences with drives failing and cards failing and it is never pleasant.

For example, I lost a RAID card on a prodcution server (not a SQL box) and the whole thing went south in a hurry. I had to replace the card and a drive (the drive "broke" when the card went screwy) and it made for a very very very long night.

Knowing what I know now, I don't think I would ever use RAID 5 again IF I had the proper budget. There are better options out there. SAN is probably a ideal setup, but there are some better RAID options that I think would be better for SQL boxes.

here is a link I give people when i explain RAID:

http://www.acnc.com/04_00.html

Right now, I am in the process of figuring out how to get our production SQL box out of RAID 5...

If I had a mulligan (at the time, I knew zero about SQL and how to setup properly), I would like at the following RAID options:

RAID 10

That would be it, but I would also look at SAN or NAS possibilities.

There is a RAID 0+1 and it is similar to RAID 5, but if you lose a drive, it could be bad...the pay off is good performance, but no need for redundancy.

HTH
Go to Top of Page

thecoffeeguy
Yak Posting Veteran

98 Posts

Posted - 2006-12-08 : 13:33:33
quote:
Originally posted by Kristen

Advantages:

1) Its client's money
2) I'd prefer an easier life
3) The client will suffer less downtime

Disadvantages

1) With no downtime the client will think no though went into it

Kristen



Do you have a specific budget? I could provide some suggestions knowing the budget.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2006-12-08 : 14:55:06
I've never had an issue with RAID 5 or RAID 10 personally. Everytime I had heard of such a thing, it always seems to be an issue with the RAID card itself. Usually it's a Driver or RAID card BIOS issue.
You need to make sure that stuff stays up to date. Also, don't buy cheap / used RAID cards. You are just asking for trouble there.

As far as SAN's and DAS's go, they generally have lots of smarts built in to detect and handle all sorts of issues. Needless to say they are generally not cheap. You are looking at $20-30K investment to get a DAS up and running. For a 200 disk setup like MVJ has, you are talking some serious money, probably north of $500,000.

Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>

Opinions expressed in this post are not necessarily those of TeleVox Software, inc. All information is provided "AS IS" with no warranties and confers no rights.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-08 : 15:07:17
"Do you have a specific budget?"

No, I can just make a cost-benefit proposal. That will go through as-is, albeit sanity checked via some other route. But recommendations are welcome. We only need a 200GB or so. Being able to swap the disks to a fail-over server would be good (only needs to support SQL data, but if it could also house the data for the IIS box, and that too could fail over, or be "farmed", that would be good too)

Just we haven't felt the need to justify SAN before because we, wrongly, assume that RAID was fail-safe. Bad mistake :-(

Kristen
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2006-12-08 : 15:29:33
Kristen,

Care to share the RAID card and controller makers names?
What type of controller? IDE, SCSI, SATA?

I have had good experiences with SCSI RAID5 systems..
Swapping just worked...





DavidM

Production is just another testing cycle
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-09 : 02:13:07
"Care to share the RAID card and controller makers names?"

Sure. Its all DELL kit as far as I know. I'll get the details.

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-09 : 04:03:39
They are Dell PowerEdge 2600 using the Perc 4Di 128MB embedded raid controller. Disks are Maxtor 73Gb 10k hotplug SCSI disks

Kristen
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-12-09 : 13:02:38
just use RAID 10. drives are cheap now and RAID 10 has so many advantages over RAID 5.

1. You can lose more than one drive out of a RAID 10 raid set
2. When you lose a drive, performance is not degraded
3. Read performance is better since you can read from both primary and mirror
4. write performance is much better since there are fewer actual IOs involved in a RAID 10 write compared to a RAID 5 write.



-ec
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-12-09 : 14:17:41
On the PowerEdge 2600 you have very limited configuration options for internal storage. A maximum of two arrays, and a maximum of 5 drives total, I believe. If you have two drives mirrored for the OS, then you are almost forced into using RAID 5 for the remaining array.

You can use an external drive array chassis in order to have more drives for more performance, but it really increases the price of the system. For any server that is going to be IO intensive, you probably need this.

This is another reason why I like the flexibility of SAN. We can use thin 1U servers without consideration of the number of drives it can hold, the number of array channels, the backplane configuration, etc. It makes a much smaller footprint in the data center to be able to put 20 SQL Servers in half a rack.

Of course, as MichaelP mentioned, SAN comes at a fairly hefty price, so you have to make sure it makes business sense in your situation.








CODO ERGO SUM
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-12-10 : 01:22:36
Well it probably costs us $1,000 each time there is a drive failure. Its an hours drive, each way, to the hosting location, and we usually get the guy to hang around until the rebuild has finished.

There are 6 machines in that rack I think ... which could all share a SAN (IIUIC), and we'd get a performance boost on all 6 machines which would extend their usable life ...

Sounds alright to me ...

Kristen
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-12-10 : 13:19:36
quote:
Originally posted by Kristen

Well it probably costs us $1,000 each time there is a drive failure. Its an hours drive, each way, to the hosting location, and we usually get the guy to hang around until the rebuild has finished.

There are 6 machines in that rack I think ... which could all share a SAN (IIUIC), and we'd get a performance boost on all 6 machines which would extend their usable life ...

Sounds alright to me ...




SAN storage doesn't guarantee a performance boost. in fact, my experience is that SAN storage is slower than proprely configured direct attach storage.

SAN storage is great when you have tons of storage to manage. If that is what you have, then by all means use a SAN. If you just have a handful of servers then it is not worth it imho.

Keep in mind that you will need to purchase HBAs for all your systems to connect to the SAN. They are $1000 per and you will want two per server. You will also need a fiber switch, they are $5k per. SANs are not cheap.

For small installations I still feel that people are better off going with a RAID 10 solution using USCSI 320. It is much faster, it is much cheaper and RAID 10 gives you the benefits I mentioned a few posts back.



-ec
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-12-10 : 13:22:01
Also, if you haven't seen Joe Chang's posts over at sql-server-performance.com you should definitely check them out.

Here is a great thread to read http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=16995



-ec
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-10 : 18:18:30
We have SAN RAID5 which I like for the reliability and storage capacity.
Not cheap, and RAID10 would be even more expensive. It is not blazingly fast (pretty slow) for large IO operations,
(Backup/Restore/Checkdb/Bulk operations etc..) For random IO and day to day operations it works well and I suspect the RAID cache helps boost the performance.

We also have RAID10 local, which I like very much for the speed and reliability. But it is limited by the servers disk capacity.
The cost is ok, considering performance/reliability/storage ratios, it's only "bad" in the latter of the three.

So far we have not had any bad experience due to hw failures, but I know imc reliability (and sleep) is top priority

rockmoose
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-01-20 : 08:45:11
FWIW same thing happened again last week, same machine. Drive failed and took the database with it

And as before restoring last full backup, then diff, then all TLog backups, gave us a clean database - i.e. with ZERO data loss.

The Backups are stored on a different array so are not effected by failure in the array storing the MDF.

Kristen
Go to Top of Page
   

- Advertisement -