Implementing Table Interfaces

By Jeff Smith on 19 May 2008 | Tags: Database Design , Table Design

Last time, we discussed Table inheritance, which allowed us to easily reduce redundancies in our table design by creating "base" or "super" tables that contain columns and relations that "sub-tables" automatically inherit.

That generally works well, but what if you just want to have several entities share a relation, but no common attributes? That is, the entities are not really the same type, and a base class wouldn't make a lot of sense. For example, suppose you are modeling Employees and Offices, and both entities can have multiple phone numbers that you'd like to store. Is there a simple way to create a data model for that without the need for redundant tables and code?

Table "Interfaces"

Continuing to use OOP terminology, I like to think of this situation as modeling "Table Interfaces". Unlike Table Inheritance, we don't want to create a base entity to be inherited by multiple entities, we just want some entities to "implement" the same interface. So, in our example, both Employees and Offices would implement the "PhoneNumbers" interface. This concept allows us to pick and choose which interfaces an entity has, whereas using inheritance means that all sub-tables always have the same columns/relations. We can add add/remove interfaces from different entities without worrying about affecting other entities that have the same base.

This is not as straight-forward in SQL, and there really is no direct, easy, standard way to handle this situation, unlike with Table Inheritance. Here's just one possible idea for you to consider if you need to implement this concept.

First, let's start with two tables, Employees and Offices:

create table Employees
(
	EmpID int primary key,
	EmpName varchar(100)
)

create table Offices
(
	OfficeID int primary key,
	OfficeName varchar(100)
)

We wish for our data model to allow both an Employee and an Office to have multiple phone numbers. Of course, to make our data meaningful and not contain just a random list of phone numbers, we will create a table of Phone Number Types:

create table PhoneNumberTypes
(
	PhoneTypeCode varchar(10) primary key,
	Description varchar(100),
	Sort int not null default 0
)

So far, so good. Now, here's where we have some options and we really need to think a little. Let's start by examining a common attempt to create a re-usable and generic "PhoneNumbers" table that I feel doesn't work so well.

A Common Approach

Many database architects will start by creating a table of phone numbers, and then they'd create two more tables: one to relate PhoneNumbers to Offices, and another to relate PhoneNumbers to Employees:

create table PhoneNumbers
(
	PhoneID int identity primary key,
	AreaCode char(3),
	Exchange char(3),
	Number char(4),
	Extension varchar(10)
)

create table EmployeePhoneNumbers
(
	EmpID int references Employees(EmpID) on delete cascade,
	PhoneID int refernces PhoneNumbers(PhoneID),
	PhoneType varchar(10) references PhoneNumberTypes(PhoneTypeCode),
	primary key (EmpID, PhoneType)
)

create table OfficePhoneNumbers
(
	OfficeID int references Offices(OfficeID) on delete cascade,
	PhoneID int references PhoneNumbers(PhoneID),
	PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode)
	primary key (OfficeID, PhoneType)
)

However, there are several problems with this situation.

Our EmployeePhoneNumbers and OfficePhoneNumbers tables are redundant, which is the problem we are trying to avoid. We may be able to use one set of stored procedures for update and delete operations, but we still need separate stored procedures for inserts and selects.
Creating a table just for PhoneNumbers and relating PhoneNumbers to Offices and/or Employees establishes phone numbers as an entity, not an attribute. Unless we are keeping track of specific phone numbers over time, and who is assigned which phone number, and who "shares" phone numbers, a phone number is not an entity, it is just an attribute. This is a bad design because introducing a "phoneID" is unnecessary and confusing. Consider changing a phone number for an employee: if PhoneNumbers are attributes, we just change the area code or extension or whatever we need to do by directly updating a row in a table. However, if it is an entity, then we should create a new phone number entity, and then assign the employee the new "phoneID" created instead of the old one, and then delete the old phone number (perhaps? Or do we keep it around?) That really should not be necessary -- we should just be able to update Bob's extension directly, right?
This design will not cascade deletes to the PhoneNumber table when an Office or Employee phone number is deleted; that must be done in a separate step. Thus, it is very easy to have orphan phone numbers that have no one assigned to them.
No constraints are in place to disallow an Office and an Employee from sharing the same PhoneID.

So, I would recommend avoiding this design as it complicates the situation without much benefit. If you're going to do this, you might has well just store the Phone Number data itself in the EmployeePhoneNumbers and OfficePhoneNumbers tables and get rid of the PhoneNumbers table completely.

A More Generic, Flexible Alternative

Instead, consider this approach:

create table PhoneNumbers

(
	EmpID int references Employees(EmpID) on delete cascade,
	OfficeID int references Offices(OfficeID) on delete cascade,

	PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode) not null
	
	AreaCode char(3),
	Exchange char(3),
	Number char(4),
	Extension varchar(10),
	
	check (case when EmpId is null then 0 else 1 end + 
	        case when OfficeID is null then 0 else 1 end = 1),
	unique constraint PhoneNumbers_UC (EmpID, OfficeID, PhoneTypeCode)
)

Let's a take minute to dissect that. We have created a table of Phone Numbers, with columns that allow us to relate the phone number to an Employee and/or an Office, but neither is required. We also have a required PhoneTypeCode column, and we have the standard phone number attributes in there as well. We could easily add a notes column, or description, or last updated/modified columns as well.

Notice the check constraint: that expression is simply ensuring that only EmpID or OfficeID is not null, but not both. There is no "exclusive or" boolean operator in T-SQL, so this is one easy way to handle it using CASE expressions that return 0 if a column is null or 1 if it has a non-null value. If we wrote:

	check (EmpID is not null OR OfficeID is not null)

Then the table would allow an entry in both the EmpID and OfficeID columns, which we do not want. Each row in this table will relate to only an office or an employee. Plus, we can easily add more columns to this check constraint as necessary (more on that later).

Next, instead of a standard primary key, we have a unique constraint on the columns EmpID/OfficeID/PhoneTypeCode. This ensures that only one row per EmpID/OfficeID/PhoneTypeCode is allowed. A primary key constraint cannot allow NULL values in any of the primary key columns, so we cannot use that type of constraint. If you recall, either EmpID or OfficeID is always null, but never both. Since PhoneTypeCode is never null, this constraint effectly ensures that there are never two entries for a single EmpID/PhoneTypeCode or an OfficeID/PhoneTypeCode.

With that, we are done. We can now write a single stored procedure that allows us add a PhoneNumber to either an Office or an Employee like this:

create procedure PhoneNumberAdd
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
	@Extension varchar(10)
as
	insert into PhoneNumbers (OfficeID, EmpID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
	values (@OfficeID, @EmpID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)

This procedure effectively requires that you pass either an EmpID or an OfficeID, since if both are null, or neither is not null, the INSERT will fail.

Writing a procedure that returns the phone number for an entity that "implements" the PhoneNumber interface is a slightly more complicated, but still quite simple. We must write our criteria expression to check for the possibility that either an @EmpID or an @OfficeID is passed in, since both conditions are possible. Thus, that results in something like this:

create procedure PhoneNumberSelect 
	@OfficeID int = null,
	@EmpID int = null
as
	select 
	    p.OfficeID, p.EmpID, p.PhoneTypeCode, 
	    pt.Description as PhoneTypeDescription, 
	    p.AreaCode, p.Exchange, p.Number, p.Extension
	from	
	    PhoneNumbers p
	inner join	
	    PhoneTypes pt on p.PhoneTypeCode = pt.PhoneTypeCode
	where	
	    EmpID = @EmpID or OfficeID = @OfficeID
	order by
	    pt.Sort

Again, effectively this requires that you pass a value for either the @OfficeID or the @EmpID parameter, but not both, and the matching data will be returned. We don't have to worry about issues where NULL != NULL because we never want to return a row where both a parameter value and a column value is NULL -- we only want to return rows where both are not null, and therefore the comparison will return TRUE. We could of course also put a @PhoneTypeID parameter in there as well if you only wanted to return a single, specific phone number of the specified type.

Update and Delete will work in much the same way:

create procedure PhoneNumberUpdate 
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
	@Extension varchar(10),
as
	update 
	    PhoneNumbers
	set 
	    AreaCode=@areacode, 
	    Exchange=@exchange, 
	    Number=@number, 
	    Extension=@extension
	where	
	    (PhoneTypeCode = @PhoneTypeCode) and 
	    (EmpID = @EmpID or OfficeID = @OfficeID)

create procedure PhoneNumberDelete
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10)
as
	delete from PhoneNumbers
	where	
	    (PhoneTypeCode = @PhoneTypeCode) and 
	    (EmpID = @EmpID or OfficeID = @OfficeID)

Calling our code is very easy. From within T-SQL or from a client, we just set the parameter that corresponds to the entity we are referencing:

    exec PhoneNumberDelete @OfficeID=2, @PhoneTypeCode='HOME'
   
    exec PhoneNumberDelete @EmpID=45, @PhoneTypeCode='WORK'
   
    ... etc ...

Implementing the Interface

With all this in place, we now can use one set of stored procedures to Add/Select/Update/Delete phone numbers. The only variable is the parameter that we pass in to the stored procedure, which indicates the entity that we are referencing.

That's all fine and dandy, but the question is: how would we set up a new entity to implement the "Phone Number" interface? For example, suppose that we now decide that we need to track "Contacts", and Contacts will have multiple phone numbers. (Please note that we could argue that Employees and Contacts should simply inherit from the same base class, and thus an interface is not necessary, but let's ignore that for the sake of this example.)

Per usual, we would start by creating a table of Contacts:

create table Contacts
(
	ContactID int primary key,
	ContactName varchar(100)
)

Now, it is time to "implement" the interface. Where do we begin? Well, it is actually done rather backwards. Normally, you alter the class you have created so that it implements a particular interface, but in this case, we need to alter the interface so that it works with the class. So, we actually don't do anything to the Contacts table -- we alter the PhoneNumbers table and the corresponding PhoneNumber stored procedures. This is a bit of a drawback of this approach, and a big reason why implementing an interface in a relational database is not as clean as implementing inheritance.

So, the steps we must take are:

Add a contactID column to the PhoneNumbers table, which is an foreign key reference back to the Contact table's ContactID column.
Alter the unique constraint on the PhoneNumbers table to include the ContactID column.
Alter the check constraint on the PhoneNumbers table to include the ContactID column.

Having done that, we now need to edit each of the stored procedures to include contactID as an optional parameter, and to include contactID throughout the SELECT and WHERE clauses.

It sounds like a lot of work, but it's really not too bad and can be done quite quickly. The "template" we've created for this interface is very easy to maintain. Despite the maintenance required, the overall process is pretty nice because:

This doesn't break any existing code -- everything already written to use the existing stored procedures will work just fine without any modification.
Once this is done, there is nothing else to do -- Contacts now completely implement the "Phone Numbers" interface, and you do not need to create any additional stored procedures to allow for your application to maintain multiple phone numbers for Contacts. It "just works"!

The "Phone Entity" Approach

Another option is to create one table with foreign key references back to the tables that "implement" your interface, and with an identity primary key on that table, and then put all of the actual data in another table. In other words, you might have one table that just defines "PhoneEntities", with links back to the parents, and then we put the actual address data in a table with a primary key of (PhoneEntityId/PhoneTypeCode).

For example, you might have a PhoneEntities table like this:

create table PhoneEntities
(
    PhoneEntityID int identity primary key,
    EmpID int references Employees(EmpID) on delete cascade,
    OfficeID int references Offices(OfficeID) on delete cascade,

    check (case when EmpId is null then 1 else 0 end + 
	    case when OfficeID is null then 0 else 1 end = 0),
	    
    unique constraint PhoneEntities_UC (EmpID, OfficeID)
)

Notice that this table just stores a reference back to either an Office or an Employee, and our unique constraint ensures that there is only one value in this table for either. It is a bit simpler because we aren't worry about PhoneTypes here, or actual phone numbers. That data is stored in a more traditional PhoneNumbers table, with a foreign key reference to the PhoneNumberEntity table:

create table PhoneNumbers
(
    PhoneEntityID int references PhoneEntities(PhoneEntityID) on delete cascade,
    PhoneTypeCode varchar(10) references PhoneTypes(PhoneTypeCode),
    AreaCode char(3),
    Exchange char(3),
    Number char(4),
    Extension varchar(10),
    primary key (PhoneEntityID, PhoneTypeCode)
)

That allows the PhoneNumbers table to be a more "normal" looking, and a bit simpler as well. This does complicate the stored procedures a little, since first you need to determine if a particular entity already has an entry in the PhoneEntities table, and if not then you need to add one in.

For example, with that schema, here's a PhoneNumberAdd stored procedure:

create procedure PhoneNumberAdd
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
	@Extension varchar(10)
as
    declare @EntityID int
    set @EntityID = (select PhoneEntityID from PhoneEntities where Office=@OfficeID or EmpID = @EmpID)
	
    if (@EntityID is null)
    begin
        insert into PhoneEntities (EmpID, OfficeID)
        values (@EmpID, @OfficeID)
        set @EntityID = scope_identity()
    end

    insert into PhoneNumbers (EntityID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
    values (@EntityID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)

This approach may or may not work better, depending on your specific situation. As you can see, the logic in the stored procedures can potentially get a little complicated since now we have two tables to deal with. But, this does makes the PhoneNumbers table itself much simpler, so you may get better performance with this approach.

Conclusion

I'll be completely honest here: this isn't a really clean and simple technique to use, it is really one of those subjective gray areas where there does not appear to be a standard method or best practice. I would love to hear feedback about alternate suggestions; I presented only a few possible ideas, and I am sure there are alternatives that may be superior in different situations. I have found that these work quite well, and again they are nice because we have full referential integrity, cascading deletes, and there are no triggers required. However, I also recognize not all techniques work in every situation, so the ideas presented here may not be applicable in many cases.

Overall these methods can work quite well depending on what you are modeling. With these techniques in place, if you decide to change the columns used to store phone numbers, you do it all in one place and your done. Or, if you decide to add "UpdatedAt" and "UpdatedBy" audit columns to track changes in the phone numbers, again you just need to do it all in one place. Every entity you have that "implements" the phone number interface will instantly get all those changes. And, most importantly, we will never have orphan phone numbers due to full support for cascading deletes, we don't have to worry about any data integrity issues since it all checks out, and no triggers were necessary for any of this.

I look forward to hearing ideas and suggestions in the comments.