Partitioning Databases in an Application Service Provider Model
By Sean Baird
on 21 September 2000
| 2 Comments
| Tags: Database Design
Yesterday, Michael asked a very good question about horizontal partitioning a table to increase query performance. He also asked a followup question that I found interesting:
"Suppose I was opening a web site that provided online accounting software to businesses (such as netledger.com). So one day, I may have 3 business subscribing and the next day 2 more may subscribe. Is it best to store all of these customer's information (invoices, customers, employees, accounts payables) together in the same database and tables? Or, is it best to make a database for each of the customers and store this db name so the software would connect to the correct database for a given user?"
The company I work for is more or less an Application Service Provider like NetLedger.com, so I have a few opinions on the matter :)
Well, Michael, I might take a different stance than others on this topic. I prefer
a number of smaller application databases, one for each client, rather than one large database for all clients. Here's why:
- Data isolation - SQL Server no longer corrupts databases seemingly at random like in the olden days. But if for some reason you do lose an application database, you only impact one client while you scramble to find a backup. (You did make a backup, right?)
- Ease of customization - This may not apply to NetLedger.com, but we do some customization work for each client. More often than not, a client's odd business rule impacts the database, and it's easier to make the change if we're not sharing the database among multiple clients.
- More flexible backups - If you have one client that heavily modifies their data and another that infrequently modifies their data, you could custom tailor a backup strategy for each.
- More flexible load balancing - If you have a single database for all clients, and you sign up a lot of new clients one year, your database may increase in size and become unmanageable from an administrative perspective (backups, DBCC's, etc.) or a performance perspective (queries slow down due to dramatically larger rowcounts). Not to mention you may have to buy faster hardware to support the increased load.
If you have one database per client, and you add a lot of clients, no problem. Each database should stay at a manageable size. If your server becomes bogged down, you can buy another similar server and move half the clients to the new server. This may be cheaper than upgrading an existing server. (For instance, going from a 4-processor box to an 8-processor box typically means not only a large investment in hardware, but a significant increase in licensing costs, especially if you need to go to an "Enterprise Edition" of your OS and SQL Server.)
Now, there are some negative side effects of having one database per client:
- More administration required - e.g. instead of having one database to back up, you have 50.
- Less efficient data caching - if you have a lot of domain tables that aren't appreciably different for each client, then SQL Server will need to cache the domain table in each database, thus reducing your data cache avaiability for "real" data.
- Less efficient procedure caching - the same problems that apply to caching domain tables can also occur with your procedure cache if you use a lot of identical stored procedures in each database.
There are workarounds to these problems. Backups can be automated. Domain tables can be moved to a common database (and you can make the database read-only if appropriate, to reduce locking overhead). Common procedures can be combined in a common database (with the domain tables, if you like) and be modified to work with each client.
Now that I think about it, in SQL Server 2000, you should be able to parition the client data among multiple client databases and use partitioned views to create a database that looks like it contains all of the client info. (SQL Server 2000 now allows you to update partitioned views, so you could get away with this.) Huh, that might be another interesting way around the data/proc caching problem. I'd be curious to see what happened if a client database was offline; I wonder if reads and writes to the online portions of the views would work.
Partitioning client data among multiple databases works really well for us. We have about 250 clients right now. Most clients have different business rules that require us to make subtle changes to our database schema. If we had to glom all of those changes into one database, I'm afraid that it would be a big mess.
In addition, each client's data is read-only and refreshed periodically (by detaching the old database and attaching a new copy, believe it or not.) Different clients have different refresh schedules, so having their data split out makes it easy to update one client's data without affecting another.
And to top it all off, we maintain multiple copies of each database on multiple SQL Servers (it's read-only data, remember, so we can get away with this sort of foolishness). This gives us a lot of redundacy, and we can add capacity quickly by deploying the appropriate databases to an additional server and informing the application that it has another SQL Server to choose from. And since we take a ton of traffic, we appreciate the ability to add capacity easily :)