Partitioning the Data in a Table

By Sean Baird on 20 September 2000 | Tags: Database Design

Michael writes "1) Is it ever good database design practice (for speed sake, etc.) to essentially make copies of tables to hold a certain group of data?

For example, I have come across a database table that stores information for a housing subdivision; ie. lot number, lot size, lot price, etc. And the database to which this table belongs stores this data for many subdivisions. However, instead of having one table that stores the subdivision information for ALL subdivisions (and having some ID that represents the specific subdivision), this database has one table for each subdivision. For example, 'Clair Ridge Estates Subdivision Info' and another table 'Possum Bend Subdivision Info', etc, with each table having the exact same fields. And, if they needed another subdivision, they would make yet another copy and give it a unique name."

Yes, there are times this is a good idea. I like this question because it reminded me of one of my favorite features of SQL Server - partitioned views.

Michael -

What you're referring to is an optimization method called horizontal partitioning. That is, a table is split up into multiple smaller tables containing the same number of columns, but fewer rows. Compare this to vertical partitioning, in which the table is split into multiple smaller tables with the same number of rows, but fewer columns.

And yes, this design decision is often made to improve performance. Horizontally partitioning a table gives us some advantages:

Each partition table will have fewer rows; if you have to (heaven forbid) table-scan the data, it will take less time.
Indexes on each partition table will be smaller (=faster seeks) than a corresponding index on the unpartitioned table.
If you need to, you can put each partition table on a different filegroup and partition the data among multiple disks/RAID volumes/drive controllers.
If you're trying to whomp Oracle's TPC-C benchmark, then you may want to consider partitioning the data among multiple federated servers in SQL Server 2000. (Although for storing information about housing subdivisions, this may be a bit over the top.)
If you create a partitioned view on the partitioned tables, you can treat the view like it is the whole table, and the QP (query processor) will only touch the tables it needs to fulfill the query. You get the benefits of horizontal partitioning without the query headache.

So, as you can see, horizontal partitioning is all about splitting up the workload - spreading out data access among tables, indexes, disks, and servers.

Why would you want to do this? Well, maybe you have a big table - hundreds of millions of rows, for instance. Or maybe not so many rows, but large rows. Or maybe you have a table in a data warehouse that contains frequently and infrequently accessed rows. All of these situations are candidates for partitioning.

Now, there are two big downsides to all of this:

Unless you use a partitioned view to access the data, you'll have to build logic into your application to access the correct table, and that has a high suck factor. Please, please, use the partitioned view instead.
You actually have to partition the data. And maintain it. And create the partitioned view. And balance the amount of data in each partitioned table, if needed. In other words, the dreaded "administrative overhead".

Okay, since I'm touting the partitioned view, let me quickly explain how to create one. I'll use the information from Michael's question as an example.First, the tables:

CREATE TABLE Subdiv_ClaireRidgeEstates (SubdivID int, LotID int /*, etc.*/)
CREATE TABLE Subdiv_TibetianYakFarms (SubdivID int, LotID int /*, etc.*/)

You may notice that I included the Subdivision ID in each table. This is important; for the partitioned view to work most effectively, the QP must be able to know that each partition table will only contain a certain type of data. To do this, you need to build CHECK constraints on each table on the ID that you're partitioning on. Since you're partitioning the data by subdivision, you will build CHECK constraints on SubdivID:

ALTER TABLE Subdiv_ClaireRidgeEstates ADD CONSTRAINT CK_CRE_SubdivID CHECK (SubdivID = 42)
ALTER TABLE Subdiv_TibetianYakFarms ADD CONSTRAINT CK_TYF_SubdivID CHECK (SubdivID = 9538)

You could just as easily partition by using a surrogate key field and assigning a range of key values to each partition table. Or by partitioning on a date and using a range of dates for each partition value. Regardless, you still need those CHECK constraints in place on each table.

After actually creating the partition tables, distributing the data, and building the CHECK constraints, building the view is pretty easy. You just SELECT * from each partition table and use UNION ALL to combine the results of the query:

CREATE VIEW Subdivision
AS
    SELECT * FROM Subdiv_ClaireRidgeEstates
UNION ALL
    SELECT * FROM Subdiv_TibetianYakFarms

Now, if you've been following along with the example, try inserting some sample rows into each table:

INSERT Subdiv_ClaireRidgeEstates VALUES (42,9999)
INSERT Subdiv_TibetianYakFarms VALUES (9538,1234)

Now, turn on the "Show Execution Plan" option in query analyzer, and run the following queries:

SELECT * FROM Subdivision WHERE SubdivID = 42
SELECT * FROM Subdivision WHERE SubdivID = 9538
SELECT * FROM Subdivision

You'll notice that for the first two queries, SQL Server only pulls information from the required partition table. Only in the last query, where we don't filter by SubdivID, does the QP pull data from each partition table.

In SQL Server 7.0, you unfortunately cannot update data in a partitioned view. However, this IS possible in SQL Server 2000. Check out SQL Server Books Online (especially if you're going to use distributed partitioned views) for the do's and don'ts of partitioning data.

Michael actually had two very good questions. Answering this one wore me out, so I'll have to post the next one tomorrow. :)

UPDATED 9/20/00 - here is the followup question, which deals with splitting application databases into multiple similar databases for each client that uses the application.

-SQLGuru