Single Database or Multiple Databases

By Chris Miller on 6 December 2000 | Tags: Database Design

Richard writes "Which is the way to better performance with Microsoft SQL Server 7.0: using a single database with lots and lots of tables or logically grouping data into multiple databases each with a significantly fewer number of tables? The system is comprised of one server with two hard drives that are mirrored. With multiple databases come multiple log files. Does having multiple transaction log files impair a recovery if needed? Also, is there any benefit in using file groups with this configuration?"

General online performance shouldn't be impacted either by using one large database or a series of smaller databases. SQL Server doesn't cache entire databases, it caches data pages, and it doesn't matter if all of the pages in memory are in one database or if they're spread out across 15 databases. There is a marginal amount of overhead involved for maintaining multiple open files.

Databases should be broken into distinct transaction elements. For example, if you have five tables that are used for order entry, three that are used for order fulfillment, and two that are used for HR, there should be three databases, one for order entry, one for fulfillment, and one for HR. That way, if a user manages to screw up a table in the HR database, you can restore it while leaving the order entry and fulfillment systems online and still rest assured that transactional integrity has been maintained at some level.

If all of the tables were in one database, you would probably have to restore the database completely, which would cause the loss of entered orders and records of fulfilled orders without real cause, or you could restore the database to a new name and try to patch the data--something to avoid if possible.

It does take longer to restore several small databases than one large database. How much longer depends on how fast you can type. The elapsed time for a backup or restore operation is a function of the amount of data in the databases being worked on. If you take a 1 GB database and split it into two 512 MB segments, it's still the same volume of data.

Filegroups should be used to split the data across multiple physical disks, where in this case a disk array is treated like a physical disk. So, if you've got a system with three different RAID channels (say a Compaq Proliant 8000) it makes sense to dump data onto two of them and put the transaction log on the third. If you know your data well enough to split the database into different filegroups to balance traffic across both data file groups you'll see a gain in performance: the disks will operate independently of each other, and you'll make the most of the available disk bandwidth.

rocketscientist.