Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 Real World High Availability in SQL Server (Long)

Author  Topic 

TonyTheDBA
Posting Yak Master

121 Posts

Posted - 2005-11-29 : 05:19:02
Before I start can I just say how impressed I am with the quality and knowledge of the people on this site, You People Rock!

Ok onto my issue. Last week we had a hardware and people failure in our organisation, NOT related to our database Servers. One of the drives in our Main Filestore server failed. . . Raid 5 with Hot Spare Drive so no problem you'd think. . . . Failed Drive removed, hot spare online and starts rebuilding itself, everything normal. . . Then one of the operators who wasn't aware of the problem and fix decided to reboot the server to see if that would fix it . . . . It Didn't, it totally screwed the array, so they had to fall back to a reformat and restore the backup . . . Incremental?? . . . 680GB at 18Gb/hour . . . 4 days later we had our filestore back. This has focussed the high level managements (CEO) mind on IT again, While its reliable it tends to slip. We now have a meeting comming up to look at how we can avoid this in the future, and I am being tasked with putting together what we need for SQL Server.

We have One system that needs High availability (we offer an FM service to other organisations on it, and are in Bon Jovi Mode ATM)and our suppliers are proposing a two SAN seperate Site linked by dark fibre hardware solution using Blade technology, SAN hardware based replication, and Doubletake snapshoting. Now the Idea is to extend this to Our Filestore and Exchange mailboxes. The other benefit is that we can consolidate our existing 18+ SQL Servers and attendant Databases into a more robust and reliable platform.

Now I have been looking at using the database mirroring and Cluster technology instead on SQL 2005, Still with SAN architecture, but removing the Hardware mirroring and Doubletake (Although I think that it might be useful for backups). I have looked at the Technet stuff on high availability and know what Microsoft claim for how robust and responsive the failover and Mirroring are, what I would be interested in hearing is what its like in the real world, do you get the 2 - 10 second response or is it more than this?

Thanks in advance

--
Regards
Tony The DBA

franco
Constraint Violating Yak Guru

255 Posts

Posted - 2005-11-29 : 06:01:00
I think it really depends of various factors, including database dimension and if this is a dedicated SQL Server Cluster.In our organization is not the case because we also have Oracle database, Domino and File system.It's really a special situation because normally the Cluster is dedicated to one application.
But to answer to your question I have to tell you that SQl Server failover take place in about 10/20 seconds.
Cheers.

Franco
Go to Top of Page

Rovastar
Starting Member

38 Posts

Posted - 2005-11-29 : 06:32:50
Let me get this straight you want to make you system higher availablilty and youa re propsing to go to SQl Server 2005 for all of your 18 webservers. I have not used SQL 2005 much but it seems liek a big project in itself to uprgrade everything.

Now the best why of checking the clustering yourself is to get a test clustered server copy the database/apps yourself and see what response tiem you are get when you power off the machines.
Do the apps work ok? Do the sessions timout, etc?
Check the logs, etc?

I would expect some problems when wholesale moving all those SQL servers (I presume non-clustered) to a clustered enviornment on a new database server (2005). Allow time to fix these problem in you assesment of solutions.


Bon Jovi Mode. Nice not heard that before. :)
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2005-11-29 : 15:09:12
Database mirroring is not available in the RTM version of SQL Server 2005. The MS team wasn't ready for it to be productionized, so you won't find many people who have used it. At PASS 2005, they said that it should be ready during spring. I am not sure how we'll get it installed, whether it'll be in a service pack or in its own release.

Log shipping is still available in 2005 and can be used for disaster recovery purposes. One of the differences between the two DR technologies is that log shipping allows you to have multiple secondary servers, whereas DM does not. We are currently using LS 2000. We plan on moving to DM 2005 as we don't require multiple DR sites.

Tara Kizer
aka tduggan
Go to Top of Page

TonyTheDBA
Posting Yak Master

121 Posts

Posted - 2005-11-30 : 04:37:11
Thanks for the replies so far. Wasn't aware that DM isn't available till the spring, and we are more likely to have implemented the hardware solution by then (Typical Knee jerk reaction from our management :< ). Personally I was wondering about clustering as well, I know that Geoclusters is something else that we can put in place, but is it possible to stretch a local cluster given a large fibre link?

There is a lot to be discussed tomorrow, but I must say that the closer we can get to a fully automatic Failover zero data loss solution over two sites the better. I'm trying to persuade tehm that we also should avoid virtualising the SQL servers using VMware, one hardware failure and we loose multiple servers. My view is that we ought to consolidate our underutilised servers onto a bigger box, which we then configure for failover.

I wish we would be allowed to test these things in our own environment, but mamagement are reluctant to spend money just for testing . . . Still I guess thats what I get paid for . . Now where did I put that bat?

Regards

Tony

--
Regards
Tony The DBA
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2005-11-30 : 12:56:32
As far as clustering across a large fibre link, can each node connect to the same SAN? If so, then you should be fine.

Our setup is that we have a SAN at our primary site and another one at the DR site. They are separted by about 300 miles. In order for us to do clustering across these sites, we'd have to purchase a hardware solution to mirror the data on the SAN. We are still looking into it. It is a big decision since it is quite pricey.

Zero data loss is just not possible. But you can get close. Here is what we have. We have a cluster on the primary site for maintenance and hardware failures. We then log ship the data to the DR site where another cluster exists in case we ever need to move to the DR site and we need to perform maintenance over there or encounter a hardware failure. With this architecture, we are close to 5 9s.

Be aware that when a failover occurs at the cluster, that there is downtime. It takes between 30 seconds and 5-10 minutes for the failover to occur. It should be under a minute though.

Tara Kizer
aka tduggan
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-11-30 : 13:08:39
Actually, you can use SQL 2005 Database Mirroring today. It's in the product, but turned off. You can turn it on with Trace Flag 1400, but Microsoft would like for customers who use it to enter a program and provide feedback. They don't really support using DM otherwise.

As far as clustering, if the database is 680GB, it is likely that a fail-over will take longer than 10-20secs. More likely it will be closer to 1-3mins, if you have a clean fail-over. Of course that is all with SQL2000. The duration is due to the time the database has to recover. Still, even if it took 10mins or 30mins, thats much better than 4 days!

If you are planning a 2005 cluster install, the prospects are much better. The DB will be available as soon as rollforward happens, which is much faster than today. You might be able to achieve 10secs with 2005.

Jon
-Like a kidney stone, this too shall pass.

http://www.sqljunkies.com/weblog/outerjoin
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-11-30 : 13:11:20
Another thought, have you looked at Merge Replication? You could point clients at different sites to different SQL Servers, and replication will keep the data in synch. The best part -- it's free. Well, except for the cost of setting it up and maintaining it! :-D

Jon
-Like a kidney stone, this too shall pass.

http://www.sqljunkies.com/weblog/outerjoin
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2005-11-30 : 13:11:53
Thanks Jon, I didn't realize that about DM. I don't think I'd put something into production that wasn't supported though.


Tara Kizer
aka tduggan
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-12-01 : 10:49:42
Good point. I've actually talked to some people who thought about it, but I steered them away. Like you, I'm interested to see how Microsoft will roll it out.

Jon
-Like a kidney stone, this too shall pass.

http://www.sqljunkies.com/weblog/outerjoin
Go to Top of Page

TonyTheDBA
Posting Yak Master

121 Posts

Posted - 2005-12-02 : 05:10:04
Thanks to everyone, Some really good info here. In principle we have agreed that we will go Hardware Clustering for our systems. For those applications that do not require 'instant' failover then we will look at using DM as it will more than likely be released and supported by then.

The other thing is that it looks as though we will get a quick failover as well as most of our databases are pretty small (10's of Gb) we just have lots of them ;) The 680Gb was file store NAS is not reall that quick is it ;)

Thanks
Tony

--
Regards
Tony The DBA
Go to Top of Page
   

- Advertisement -