Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Database Design and Application Architecture
 How to prevent duplicate keys in archive database?

Author  Topic 

kimlee
Starting Member

2 Posts

Posted - 2014-08-20 : 10:38:37
Background: I'm working on a project where I have to make an archive database. The archive database should get all data of the operational database. It should even save every update ever made, so it literally contains the entire history of the operational database (this is a must and the whole project revolves around this idea). So this is solved by using Change Data Capture. After that the data should go through a staging area and eventually in the data warehouse database. I came out with a solution and worked it out in the prototype and it seemed to be working all fine. I stupidly forgot to include the foreign keys, so the archive database didn't have the original structure but it should ofcourse (no wonder it went okay without too much hassle).

Problem: Because we want to store everything in archive, there will be duplicate primary keys (just for instance, many same contact_id's because telephone number changes a couple of times). I thought to solve this by adding a new primary key which says is auto-increment and purely exist to make a record unique. But when it comes to foreign keys, it's impossible. You want contact_id to be able to be duplicate and in that case it cannot be a primary key. But foreign key is only able to reference to a primary key or another unique key but not other normal columns.

I want to use a script of the operational database to generate the archive database. Then it would have the same structure as the operational db. The difference between these two databases is that the archive should have two extra columns: id (the new primary key as described as above) and status (to tell which records are active and which ones are inactive). Contact_id should stay as reference as foreign key. I can't have composite key. The only solution I can think of is to have another column stored called "previous_id". So after each change the person gets a new contact_id (it's not a problem as the contact_id is used nowhere else other than the db). Then contact_id would remain unique again. But it's important that other tables who use contact_id as a foreign key reference the most recent contact_id. There are some issues with this workaround, so it's not ideal. I'm looking for a more simple solution to this.

Any advice on this? It's an absolute must to store all changes.

Bustaz Kool
Master Smack Fu Yak Hacker

1834 Posts

Posted - 2014-08-20 : 19:03:49
To use your example, can't you define a foreign key on contact_id from the archive to the contact table and then make a compound primary key on archive using contact_id and a uniqueifier, such as an identity column? The two tables won't have the same identical schema but, ultimately, they can't for the very reasons you have listed. The archive table has to allow multiple entries for the primary key of the parent table.



Too often we enjoy the comfort of opinion without the discomfort of thought. - John F. Kennedy
Go to Top of Page

kimlee
Starting Member

2 Posts

Posted - 2014-08-22 : 05:57:01
Hey, thanks for your reply.

I think that the problem with that solution is that the data in the archive still needs to be used later on in the DWH and the archive still needs to match correctly with the FK's then, so it cannot have a compound key. If I understand your post correctly that is.

I also posted this on another forum and people talked about slowly changing dimensions (type 2, maybe type 4). What is your opinion about that? I've never worked with any of this before and now I'm trying to work out a little test with it.
Go to Top of Page
   

- Advertisement -