Return to Implementing Table Interfaces
Implementing Table Interfaces
Written by Jeff Smith on 19 May 2008
Last time, we discussed Table inheritance, which allowed us to easily reduce redundancies in our table design by creating "base" or "super" tables that contain columns and relations that "sub-tables" automatically inherit.
That generally works well, but what if you just want to have several entities share a relation, but no common attributes? That is, the entities are not really the same type, and a base class wouldn't make a lot of sense. For example, suppose you are modeling Employees and Offices, and both entities can have multiple phone numbers that you'd like to store. Is there a simple way to create a data model for that without the need for redundant tables and code?
Table "Interfaces"
Continuing to use OOP terminology, I like to think of this situation as modeling
"Table Interfaces". Unlike Table Inheritance, we don't want to create a base entity
to be inherited by multiple entities, we just want some entities to "implement"
the same interface. So, in our example, both Employees and Offices would implement
the "PhoneNumbers" interface. This concept allows us to pick and choose which interfaces
an entity has, whereas using inheritance means that all sub-tables always have the
same columns/relations. We can add add/remove interfaces from different entities
without worrying about affecting other entities that have the same base.
This is not as straight-forward in SQL, and there really is no direct, easy, standard
way to handle this situation, unlike with Table Inheritance. Here's just one possible
idea for you to consider if you need to implement this concept.
First, let's start with two tables, Employees and Offices:
create table Employees
(
EmpID int primary key,
EmpName varchar(100)
)
create table Offices
(
OfficeID int primary key,
OfficeName varchar(100)
)
We wish for our data model to allow both an Employee and an Office to have multiple
phone numbers. Of course, to make our data meaningful and not contain just a random
list of phone numbers, we will create a table of Phone Number Types:
create table PhoneNumberTypes
(
PhoneTypeCode varchar(10) primary key,
Description varchar(100),
Sort int not null default 0
)
So far, so good. Now, here's where we have some options and we really need to think
a little. Let's start by examining a common attempt to create a re-usable and generic
"PhoneNumbers" table that I feel doesn't work so well.
A Common Approach
Many database architects will start by creating a table of phone numbers, and then
they'd create two more tables: one to relate PhoneNumbers to Offices, and another
to relate PhoneNumbers to Employees:
create table PhoneNumbers
(
PhoneID int identity primary key,
AreaCode char(3),
Exchange char(3),
Number char(4),
Extension varchar(10)
)
create table EmployeePhoneNumbers
(
EmpID int references Employees(EmpID) on delete cascade,
PhoneID int refernces PhoneNumbers(PhoneID),
PhoneType varchar(10) references PhoneNumberTypes(PhoneTypeCode),
primary key (EmpID, PhoneType)
)
create table OfficePhoneNumbers
(
OfficeID int references Offices(OfficeID) on delete cascade,
PhoneID int references PhoneNumbers(PhoneID),
PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode)
primary key (OfficeID, PhoneType)
)
However, there are several problems with this situation.
- Our EmployeePhoneNumbers and OfficePhoneNumbers tables are redundant, which is the
problem we are trying to avoid. We may be able to use one set of stored procedures
for update and delete operations, but we still need separate stored procedures for
inserts and selects.
- Creating a table just for PhoneNumbers and relating PhoneNumbers to Offices and/or
Employees establishes phone numbers as an entity, not an attribute.
Unless we are keeping track of specific phone numbers over time, and who is assigned
which phone number, and who "shares" phone numbers, a phone number is not an entity, it is just an attribute. This
is a bad design because introducing a "phoneID" is unnecessary and confusing. Consider
changing a phone number for an employee: if PhoneNumbers are attributes, we just
change the area code or extension or whatever we need to do by directly updating
a row in a table. However, if it is an entity, then we should create a new phone number entity,
and then assign the employee the new "phoneID" created instead of the old one, and
then delete the old phone number (perhaps? Or do we keep it around?) That really
should not be necessary -- we should just be able to update Bob's extension directly,
right?
- This design will not cascade deletes to the PhoneNumber table when an Office or
Employee phone number is deleted; that must be done in a separate step. Thus, it
is very easy to have orphan phone numbers that have no one assigned to them.
-
No constraints are in place to disallow an Office and an Employee from sharing the same PhoneID.
So, I would recommend avoiding this design as it complicates the situation without
much benefit. If you're going to do this, you might has well just store the Phone Number data itself in the EmployeePhoneNumbers and OfficePhoneNumbers tables
and get rid of the PhoneNumbers table completely.
A More Generic, Flexible Alternative
Instead, consider this approach:
create table PhoneNumbers
(
EmpID int references Employees(EmpID) on delete cascade,
OfficeID int references Offices(OfficeID) on delete cascade,
PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode) not null
AreaCode char(3),
Exchange char(3),
Number char(4),
Extension varchar(10),
check (case when EmpId is null then 0 else 1 end +
case when OfficeID is null then 0 else 1 end = 1),
unique constraint PhoneNumbers_UC (EmpID, OfficeID, PhoneTypeCode)
)
Let's a take minute to dissect that. We have created a table of Phone Numbers, with
columns that allow us to relate the phone number to an Employee and/or an Office,
but neither is required. We also have a required PhoneTypeCode column, and we have
the standard phone number attributes in there as well. We could easily add a notes
column, or description, or last updated/modified columns as well.
Notice the check constraint: that expression is simply ensuring that only EmpID
or OfficeID is not null, but not both. There is no "exclusive or" boolean operator
in T-SQL, so this is one easy way to handle it using CASE expressions that return
0 if a column is null or 1 if it has a non-null value. If we wrote:
check (EmpID is not null OR OfficeID is not null)
Then the table would allow an entry in both the EmpID and OfficeID columns, which
we do not want. Each row in this table will relate to only an office or
an employee. Plus, we can easily add more columns to this check constraint as necessary
(more on that later).
Next, instead of a standard primary key, we have a unique constraint on the columns
EmpID/OfficeID/PhoneTypeCode. This ensures that only one row per EmpID/OfficeID/PhoneTypeCode
is allowed. A primary key constraint cannot allow NULL values in any of the primary
key columns, so we cannot use that type of constraint. If you recall, either EmpID
or OfficeID is always null, but never both. Since PhoneTypeCode is never null, this
constraint effectly ensures that there are never two entries for a single EmpID/PhoneTypeCode
or an OfficeID/PhoneTypeCode.
With that, we are done. We can now write a single stored procedure that allows us
add a PhoneNumber to either an Office or an Employee like this:
create procedure PhoneNumberAdd
@OfficeID int = null,
@EmpID int = null,
@PhoneTypeCode varchar(10),
@AreaCode char(3),
@Exchange char(3),
@Number char(4),
@Extension varchar(10)
as
insert into PhoneNumbers (OfficeID, EmpID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
values (@OfficeID, @EmpID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)
This procedure effectively requires that you pass either an EmpID or an OfficeID,
since if both are null, or neither is not null, the INSERT will fail.
Writing a procedure that returns the phone number for an entity that "implements"
the PhoneNumber interface is a slightly more complicated, but still quite simple.
We must write our criteria expression to check for the possibility that either an
@EmpID or an @OfficeID is passed in, since both conditions are possible. Thus, that
results in something like this:
create procedure PhoneNumberSelect
@OfficeID int = null,
@EmpID int = null
as
select
p.OfficeID, p.EmpID, p.PhoneTypeCode,
pt.Description as PhoneTypeDescription,
p.AreaCode, p.Exchange, p.Number, p.Extension
from
PhoneNumbers p
inner join
PhoneTypes pt on p.PhoneTypeCode = pt.PhoneTypeCode
where
EmpID = @EmpID or OfficeID = @OfficeID
order by
pt.Sort
Again, effectively this requires that you pass a value for either the @OfficeID
or the @EmpID parameter, but not both, and the matching data will be returned. We
don't have to worry about issues where NULL != NULL because we never want to return
a row where both a parameter value and a column value is NULL -- we only want to
return rows where both are not null, and therefore the comparison will return TRUE.
We could of course also put a @PhoneTypeID parameter in there as well if you only
wanted to return a single, specific phone number of the specified type.
Update and Delete will work in much the same way:
create procedure PhoneNumberUpdate
@OfficeID int = null,
@EmpID int = null,
@PhoneTypeCode varchar(10),
@AreaCode char(3),
@Exchange char(3),
@Number char(4),
@Extension varchar(10),
as
update
PhoneNumbers
set
AreaCode=@areacode,
Exchange=@exchange,
Number=@number,
Extension=@extension
where
(PhoneTypeCode = @PhoneTypeCode) and
(EmpID = @EmpID or OfficeID = @OfficeID)
create procedure PhoneNumberDelete
@OfficeID int = null,
@EmpID int = null,
@PhoneTypeCode varchar(10)
as
delete from PhoneNumbers
where
(PhoneTypeCode = @PhoneTypeCode) and
(EmpID = @EmpID or OfficeID = @OfficeID)
Calling our code is very easy. From within T-SQL or from a client, we just set the parameter that corresponds to the entity we are referencing:
exec PhoneNumberDelete @OfficeID=2, @PhoneTypeCode='HOME'
exec PhoneNumberDelete @EmpID=45, @PhoneTypeCode='WORK'
... etc ...
Implementing the Interface
With all this in place, we now can use one set of stored procedures to Add/Select/Update/Delete
phone numbers. The only variable is the parameter that we pass in to the stored
procedure, which indicates the entity that we are referencing.
That's all fine and dandy, but the question is: how would we set up a new entity
to implement the "Phone Number" interface? For example, suppose that we now decide
that we need to track "Contacts", and Contacts will have multiple phone numbers.
(Please note that we could argue that Employees and Contacts should simply inherit
from the same base class, and thus an interface is not necessary, but let's ignore
that for the sake of this example.)
Per usual, we would start by creating a table of Contacts:
create table Contacts
(
ContactID int primary key,
ContactName varchar(100)
)
Now, it is time to "implement" the interface. Where do we begin? Well, it is actually
done rather backwards. Normally, you alter the class you have created so that it
implements a particular interface, but in this case, we need to alter the interface
so that it works with the class. So, we actually don't do anything to the Contacts
table -- we alter the PhoneNumbers table and the corresponding PhoneNumber stored
procedures. This is a bit of a drawback of this approach, and a big reason why implementing
an interface in a relational database is not as clean as implementing inheritance.
So, the steps we must take are:
- Add a contactID column to the PhoneNumbers table, which is an foreign key reference
back to the Contact table's ContactID column.
- Alter the unique constraint on the PhoneNumbers table to include the ContactID column.
- Alter the check constraint on the PhoneNumbers table to include the ContactID column.
Having done that, we now need to edit each of the stored procedures to include contactID
as an optional parameter, and to include contactID throughout the SELECT and WHERE
clauses.
It sounds like a lot of work, but it's really not too bad and can be done quite
quickly. The "template" we've created for this interface is very easy to maintain.
Despite the maintenance required, the overall process is pretty nice because:
- This doesn't break any existing code -- everything already written to use the existing
stored procedures will work just fine without any modification.
- Once this is done, there is nothing else to do -- Contacts now completely implement
the "Phone Numbers" interface, and you do not need to create any additional stored
procedures to allow for your application to maintain multiple phone numbers for
Contacts. It "just works"!
The "Phone Entity" Approach
Another option is to create one table with foreign key references back to the tables that "implement" your interface, and with an identity primary key
on that table, and then put all of the actual data in another table. In other words, you might have one table that just defines "PhoneEntities", with links back to the
parents, and then we put the actual address data in a table with a primary key of (PhoneEntityId/PhoneTypeCode).
For example, you might have a PhoneEntities table like this:
create table PhoneEntities
(
PhoneEntityID int identity primary key,
EmpID int references Employees(EmpID) on delete cascade,
OfficeID int references Offices(OfficeID) on delete cascade,
check (case when EmpId is null then 1 else 0 end +
case when OfficeID is null then 0 else 1 end = 0),
unique constraint PhoneEntities_UC (EmpID, OfficeID)
)
Notice that this table just stores a reference back to either an Office or an Employee, and our unique constraint ensures that there is only one
value in this table for either. It is a bit simpler because we aren't worry about PhoneTypes here, or actual phone numbers. That data is stored in a more traditional PhoneNumbers table, with a foreign key reference to the PhoneNumberEntity table:
create table PhoneNumbers
(
PhoneEntityID int references PhoneEntities(PhoneEntityID) on delete cascade,
PhoneTypeCode varchar(10) references PhoneTypes(PhoneTypeCode),
AreaCode char(3),
Exchange char(3),
Number char(4),
Extension varchar(10),
primary key (PhoneEntityID, PhoneTypeCode)
)
That allows the PhoneNumbers table to be a more "normal" looking, and a bit simpler as well. This does complicate the stored procedures a little, since
first you need to determine if a particular entity already has an entry in the PhoneEntities table, and if not then you need to add one in.
For example, with that schema, here's a PhoneNumberAdd stored procedure:
create procedure PhoneNumberAdd
@OfficeID int = null,
@EmpID int = null,
@PhoneTypeCode varchar(10),
@AreaCode char(3),
@Exchange char(3),
@Number char(4),
@Extension varchar(10)
as
declare @EntityID int
set @EntityID = (select PhoneEntityID from PhoneEntities where Office=@OfficeID or EmpID = @EmpID)
if (@EntityID is null)
begin
insert into PhoneEntities (EmpID, OfficeID)
values (@EmpID, @OfficeID)
set @EntityID = scope_identity()
end
insert into PhoneNumbers (EntityID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
values (@EntityID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)
This approach may or may not work better, depending on your specific situation. As you can see, the logic in the stored procedures can potentially get
a little complicated since now we have two tables to deal with. But, this does makes the PhoneNumbers table itself much simpler, so you may get better performance with this approach.
Conclusion
I'll be completely honest here: this isn't a really clean and simple technique to
use, it is really one of those subjective gray areas where there does not appear
to be a standard method or best practice. I would love to hear feedback about alternate
suggestions; I presented only a few possible ideas, and I am sure there are alternatives that may
be superior in different situations. I have found that these work quite well, and
again they are nice because we have full referential integrity, cascading deletes,
and there are no triggers required. However, I also recognize not all techniques work in every situation, so
the ideas presented here may not be applicable in many cases.
Overall these methods can work quite well depending on what you are modeling. With these techniques
in place, if you decide to change the columns used to store phone numbers, you do
it all in one place and your done. Or, if you decide to add "UpdatedAt" and "UpdatedBy"
audit columns to track changes in the phone numbers, again you just need to do it
all in one place. Every entity you have that "implements" the phone number interface
will instantly get all those changes. And, most importantly, we will never have
orphan phone numbers due to full support for cascading deletes, we don't have to
worry about any data integrity issues since it all checks out, and no triggers were
necessary for any of this.
I look forward to hearing ideas and suggestions in the comments.
|