For example, a question came up recently about modeling
the following entities in a "School" database:
- Students
- Teachers
- Parents
Each of those entities has many of the same attributes, such as first name, last name,
middle name, and birth date. Yet, we must separate them into multiple tables because we need to store and
track different data for students, teachers and parents: students have grades and
classes and parents; teachers have classes taught, skills, employment information,
and so on.
In addition to sharing common attributes, these entities also have common relations.
For example, for each of those entities we might also like to store addresses, phone
numbers, correspondence history, etc. To do this in a nicely normalized database, we would model that data by creating additional tables:
- StudentsAddresses
- TeachersAddresses
- ParentsAddresses
- StudentPhoneNumbers
- TeacherPhoneNumbers
- ParentPhoneNumbers
- StudentCorrespondance
- TeacherCorrespondance
- ParentCorrespondance
- ...etc...
On top of the redundant, similar tables, we would need a whole mess of redundant, similar
stored procedures to add/update/delete/select items from these tables. Yet,
again, we need different tables for these different entities because they each have
their own set of relations and attributes to track.
Is there an easier way to model this in a relational database? Absolutely -- let's
take a look.
Creating A "Base Table"
We can start by recognizing that Students, Teachers, and Parents are all "People",
and we can note that it makes sense to say that all People can have addresses and
phone numbers and correspondence history:
- People
- PeopleAddresses
- PeoplePhoneNumbers
- PeopleCorrespondence
In the People table, we'd store all of the common attributes of Students, Teachers
and Parents that we discussed earlier: name, birth date, and so on. We remove all
of these common columns from the Students, Teachers and Parents tables and put them
all in one place. Now, maintaining phone numbers, addresses, names, birthdays, and
correspondence can all be done with one set of generic stored procedures. The redundancy
of those activities has now been reduced, and any changes in phone number or address
formats can all be done in place. We can refer to the People table as a "base table".
Deriving Sub-Tables from the Base Table
Of course, we still need our Students, Teachers and Parents tables -- but now the
primary key of these tables also becomes a foreign key to the People table.
Because any row in the Students, Teachers or Parents tables require a related row
in the People table, and it also shares the same primary key as the People table
(i.e., it is a 1:1 relation), I'll refer to the Students, Teachers and Parents tables
as sub-tables of the People table. Think of these sub-tables as tables that extend
the basic information that the People table provides; this is similar to how inheritance
works in Object-Oriented Programming (OOP). Essentially, we are performing "table inheritance"
by doing this; since every student, teacher and parent is by definition also a "person"
and we are guaranteed that we can work with all of those entities the same way by
treating them as People if we want, or we can work with them using their specific
attributes and relations.
That gives us a schema like this:
create table People(PersonID int primary key, Name varchar ... etc ...)
create table Students(PersonID int primary key references People(PersonID), ...)
create table Teachers(PersonID int primary key references People(PersonID), ...)
create table Parents(PersonID int primary key references People(PersonID), ...)
With that particular schema, notice a few things:
- You cannot create a Student/Teacher/Parent until you first set up the "Person".
Once the person is created, you can now add a related row to the appropriate table,
depending what type they are.
- We still have our common data all in one place, but now we have tables set up to
store specific attributes and relations for these different types of People. A simple
JOIN from Students to People gives us the student's name, contact information, and
so on.
- This schema allows a person to be a student, teacher and parent -- all at once!
It also allows a person to be none of these -- you simply don't add a related row
in any of the sub-tables.
- If we want to output a "Type" column (Student, Teacher, Parent) when querying the
People table, we must outer join to all the 3 "sub-tables" and determine if any
matching rows exist.
That 3rd point is important, as it may not be what you want; we'll get back to that
in a moment. But for our purposes, I think it makes sense: suppose a person is initially
a student, comes back to work for the school as a teacher, and then eventually has
kids of their own. This schema allows us to handle that.
The 4th point is important to consider as well. We could add a "PersonType" column
to the table, and create a table of PersonTypes -- but then we introduce redundancies
and the possibility of conflicting data. Also, as mentioned, a person can be more
than one "type" at a time, so the existence of this column doesn't really make sense
in this table without changing our logical model.
Using this schema is easy; we can quickly get all of the information for all of
the students with a simple join:
select s.*, p.*
from students s
inner join people p on s.personID = p.personID
(you'd replace * with actual columns that you need, of course.)
So, everything works beautifully. We have our base table, we have our sub-tables,
and the inheritance allows us to work with the different entities all the same way,
but to also treat each of them specifically on their own as necessary.
Modeling One-to-Either Constraints
However, what if we want to set up our model so that a Person can have only one
type; i.e., either a Student, OR a teacher, OR a parent? This is a more accurate
database implementation of inheritance, since in OOP you cannot create an instance
of something that is more than one sub-class at a time -- you must pick one or the other
(or potentially the base class itself, if that's what you need).
Modeling this can be tricky, and it can be done a couple of ways. Here’s a technique
I like to use, it doesn’t require a trigger, it is easy to implement, and it uses
standard referential integrity without complicated check constraints.
First, create a table of PersonTypes:
create table PersonType (PersonTypeID int primary key, PersonType varchar(10))
And let’s insert the following values, one per “sub-table”:
insert into PersonType
select 1,'Student' union all
select 2,'Teacher' union all
select 3,'Parent'
This is the table that defines the different types our system will allow. There
will be one entry in this table per "sub-table" that we create. If we want to allow
for a Person to be just a generic "Person" without any specify sub-type, we could
add a row for "Other".
With that table in place, we will now add a PersonType column to our People table.
For this example, let’s only define a generic Name column for our People table:
create table People
(
PersonID int primary key,
PersonTypeID int references PersonType(PersonTypeID) not null,
Name varchar(10) not null
)
Pretty standard stuff so far – we have different types of People we are modeling,
and each Person must be assigned a type.
And now comes the trick: We will add an extra PersonType column to each of our sub-tables,
and force the value in that column to be equal to the PersonTypeID for that table.
In SQL 2000, we can just add a default and a check constraint, but in SQL 2005 we
can use a persisted computed column. So, our Students table will have a constant
PersonType value of 1, the Teachers table a value of 2, and the Parents table a
value of 3.
Then, we simply create a foreign key relation back to the People table – but instead
of just on the PersonID column, we create the constraint on both the PersonID and
the PersonTypeID columns. With these constraints in place, we are guaranteed that
Person will only have one type, and that they can only have data in the matching
sub-table for that type.
There is a catch, however: we cannot create our sub-tables just yet; if we try it,
we’ll get the following error:
create table Students
(
PersonID int primary key,
PersonTypeID as 1 persisted, -- student
EnrollmentDate datetime,
foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)
Msg 1776, Level 16, State 0, Line 1
There are no primary or candidate keys in the referenced table 'People' that
match the referencing column list in the foreign key 'FK__Students__1F98B2C1'.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.
This is because the unique constraint on the People table is just on the PersonID
column, but we are trying to set up a foreign key constraint on the combination
of PersonID/PersonTypeID. To handle this, we simply add an additional unique constraint
to the People table, covering both PersonID and PersonTypeID:
drop table People
go
create table People
(
PersonID int primary key,
PersonTypeID int references PersonType(PersonTypeID),
Name varchar(10)
constraint People_AltPK unique (PersonID,PersonTypeID)
)
With that in place, we are good to go:
create table Students
(
PersonID int primary key,
PersonTypeID as 1 persisted, -- student
EnrollmentDate datetime,
foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)
create table Teachers
(
PersonID int primary key,
PersonTypeID as 2 persisted, -- teacher
HireDate datetime,
foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)
create table Parents
(
PersonID int primary key,
PersonTypeID as 3 persisted, -- parents
DifficultyScore int,
foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)
Command(s) completed successfully.
Note: If you are using SQL Server 2000, instead of computed columns for the PersonTypeID
constants, you would instead use a default value and a check constraint, since it
does not allow computed columns to be used in foreign key constraints:
create table Teachers -- SQL 2000 Version
(
PersonID int primary key,
PersonTypeID int not null default 2 check (PersonTypeID = 2), -- teachers type
HireDate datetime,
foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)
Now to test it. Let’s create a new Person, called "Bob", and make him a student.
First, we add a row to the People table, and then a matching row to the Student
table:
insert into People (PersonID, PersonTypeID, Name)
values (1,1,'Bob')
(1 row(s) affected)
insert into Students (PersonID, EnrollmentDate)
values (1,'2007-01-01')
(1 row(s) affected)
Notice that we cannot add teacher information for Bob, since he is a student:
insert into Teachers (PersonID, HireDate)
values (1,'2007-01-01')
Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK__Teachers__31B762FC".
The conflict occurred in database "PlayGround", table "dbo.People".
The statement has been terminated.
And, of course, we cannot change Bob’s type from a student to a teacher, since
there is already a row in the Students table for him:
update People set PersonTypeID=2 where PersonID = 1
Msg 547, Level 16, State 0, Line 1
The UPDATE statement conflicted with the REFERENCE constraint "FK__Students__2645B050".
The conflict occurred in database "PlayGround", table "dbo.Students".
The statement has been terminated.
The only way to change Bob’s type is to remove the row from the Students table
first:
-- delete student attributes:
delete from Students where PersonID=1
-- change the type from a student to to a teacher
update People set PersonTypeID=2 where PersonID = 1
-- add teacher attributes:
insert into Teachers (PersonID, HireDate)
values (1,'2007-01-01')
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
And there you have it! A "one-to-either" constraint, where each row in one table
has a related row in only one of several possible tables, without the need for any triggers or any complicated T-SQL programming to maintain data integrity.
Conclusion
Once things are in place, we have a simpler data model with fewer tables and stored
procedures, and we are guaranteed consistency when dealing with common attributes
and relations of "People" in the database. We can easily create new types of People
simply by adding a row to the PersonTypes table and then creating a new "sub-table".
As soon as the new sub-table is created, that new type immediately has all of the
attributes and relations of a Person – without coding up anything or making any
other changes to the database!