Article: Identity and Primary Keys

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

Article Discussion

Article: Identity and Primary Keys

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2001-02-11 : 01:05:51

If you are an Access user, or migrated from Access to SQL Server, you may think that a primary key and an Identity/Autonumber column are the same thing. Well, they’re not, and there are some very good reasons NOT to use an Identity column for the primary key on your tables.

Article Link.

Ron Soukup
Starting Member

2 Posts

Posted - 2001-11-30 : 19:56:40

I am not commenting on the article altogether, but only a quote attributed to me, which I have never made. I certainly believe that sometimes Identity is a perfectly good choice as a PK. I have never said or believed otherwise. (Identity is not necessarily a great choice as your clustering index - but thats another discussion, and thats a physical issue, not a logical one. And you certainly could have your PK be on your identity column, and your clustered index on something else in any case.)

The author says he is 'paraphrasing' me. All I can think of is that he is referring to a comment I wrote in Inside SQL Server, where I decry the use of cursors purely to mimic ISAM style behavior with a series of nested selects. I said that people often do this as a quick and dirty way to port an ISAM app to SQL Server, but I viewed that approach only for people who think there is not enough time to do things right, but always enough time to do it over. My comment had nothing to do with identity as a PK choice. His paraphrasing of a quote of mine is towards an entirely different subject.

A better tongue in cheek saying that I'd use in this discussion, where I think Identity CAN be a perfectly suitable PK (but as always it depends), is "Generalizations are always wrong." Regards,
Ron

Edited by - Ron Soukup on 11/30/2001 20:05:15

Edited by - Ron Soukup on 12/01/2001 12:24:25

Edited by - Ron Soukup on 12/01/2001 12:25:08

Edited by - Ron Soukup on 12/01/2001 12:26:38

nr
SQLTeam MVY

12543 Posts

Posted - 2001-11-30 : 23:29:22

I think it was the choice of identity as a clustered index.
I'm pretty sure it did turn up a few times - can't remember where though. I've certainly seen it stated that you should never make an identity a clustered index - but not since v7 came out for some reason.
Bit of a shame that doesn't still appear as it used to be a good interview question for people who had just read a few books.
It may have originated from avoiding hot spots in oltp databases.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2001-12-01 : 16:51:20

quote:
The author says he is 'paraphrasing' me...My comment had nothing to do with identity as a PK choice. His paraphrasing of a quote of mine is towards an entirely different subject.

First things first: Mr. Soukup is 100% correct, and I apologize for the context in which his quote was used. It was indeed taken from Inside SQL Server 6.5. As with just about everything in the article, the intention for using his words didn't translate in the final text; that failure is entirely mine, not Ron's.

I hope this didn't cause you undue grief Ron, and hope that you can believe it was meant in the spirit of "imitation is the sincerest form of flattery", which WAS my intention, honestly! Inside SQL Server 6.5 was the first SQL Server book I bought, and it's been a tremendous resource for me even today, when I only use 7.0 and 2000. Thank you.

At the risk of (once again) making my point badly, my reasoning for using Ron's quote went something like this: if you are like me, and MS Access was your first serious DB, AND you are going to SQL Server, you owe it to yourself to review your table designs BEFORE you move them up.

Those convenient "autonumber" columns that Access puts in for you may NOT be a good idea in a SQL Server database, as I learned much later at some cost. The inertia behind changing a design that "works", even if it's not a good one, was how I interpreted Ron's words, and I felt that simply upsizing Access tables was "a quick and dirty way to port an ISAM app to SQL Server", and that it was "for people who think there is not enough time to do things right, but always enough time to do it over".

Once again, my apologies for making a bad presentation of some excellent advice from Ron Soukup. As far as primary keys are concerned, I am forever keeping my mouth shut!

mono
Starting Member

36 Posts

Posted - 2001-12-02 : 07:41:58

When Celko says you haven't got a table if you haven't got a key, I understand him to mean that you have to be able to identify each record uniquely. That certainly makes sense.

If you have a table with duplicates, then you have a problem. Numbering the rows e.g. by adding an autonumber/identity column allows you to identify each row uniquely but: it doesn't solve the core problem (addressed in the article): how do you differentiate between "Smith, John" in row 27 and "Smith, John" in row 345?

Identity columns should not be used as a way of making each row unique. The rows have to be uniquely identifiable without the identity column. Adding an identity then makes sense as a way to speed up joins, as several commentators have remarked.

I deal with school data across multiple schools. We have a lot of John Smiths. If someone wants the attendance data for a particular John Smith, they have to be able to identify that John Smith. In the real world, users do not see (should not be burdened with) our techie internal identity values so they will never say "the data for the John Smith with PK=1234567".

They will say "the data for John Smith in Class 5b, CityVille High School". Now if there are two of those (not uncommon), there has to be some way to tell them apart, some way accessible to users. In the real world, often but not always there is a Roll Number or Student ID number (the number that appears on a student's ID card, not the Identity value in the table!). The Roll No is guaranteed to be unique. It is therefore the ideal PK. For the same reason Social Security Number (National Insurance number in the UK) is a popular PK.

But the Roll No is not something known to most users including teachers (only the school admin people deal with roll nos) and sometimes there is no such convenient unique indentifier (passport numbers would be great, but they are not available to most users and probably illegal for this purpose).

So, until the UN decides to allocate a public GUID to each human at birth, we have to use a combination of two or more of name, middle initial, date of birth, house number etc that makes the probability of confusion (duplicates) negligeable. The probability of two students sharing forename, surname, middle initial and DoB is actually small, but maybe nt small enough. Adding house number should reduce it by a factor of 10 or more. Perhaps adding the name of the mother would make it so enough and so on, until you the expected number of dupicates over the lifetime of your system is small enough to live with.

Whatever combination we decide is going to be our way of uniquely identifying students should de facto have a unique constraint added. You may, once a blue moon, get a customer complaining they can't add a student because the system says the student is already in there. It is our 'natural' PK, but it's a real pain as a Foreign Key (FK), so, after adding our uniqueness constraint, we add an Identity column that functions as a kind of stand-in for our lengthy multi-column PK. We use that identity for joining etc.

I use Identity columns almost everywhere for making joins and programming easier but I *always* make sure my rows are unique (and enforce it with a constraint) without the Identity.

For data import/export and for replication, the Identity columns can be a real pain, as others have already remarked. But for such situations, Identity columns should not be included: they are not really part of the information contained in the record. If they are treated as a purely internal convenience and all your rows are unique without them, the import/export and replication problems do not arise.

nr
SQLTeam MVY

12543 Posts

Posted - 2001-12-02 : 11:39:14

A definition of a database is that every row in every table is unique - so if yuo follow this defintion and have duplicate rows then you don't have a database.

"For data import/export and for replication, the Identity columns can be a real pain, as others have already remarked. But for such situations, Identity columns should not be included: they are not really part of the information contained in the record. If they are treated as a purely internal convenience and all your rows are unique without them, the import/export and replication problems do not arise."

Nice generalisation but unfortunatly all situations should be treated uniquely. Sometimes it is good to transfer the identity, sometimes it isn't, sometimes it is essential. And the identity may be considered part of the information provided by the source.

Remember that things like identities don't have to be part of the logical model but can be added at the physical implementation stage.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2001-12-02 : 17:41:43

quote:
For the same reason Social Security Number (National Insurance number in the UK) is a popular PK.

... (passport numbers would be great, but they are not available to most users and probably illegal for this purpose).

I've got bad news for you mono... Using the SSN as a primary key is illegal, too. But many companies still do it. There are a couple other problems with SSN as PK. I have heard (no confirmation) that SSN's get re-used after people pass away. Also, not every person has an SSN. What do you do then? Put in a dummy number? Then why not just put one in anyway?

And, of course, the common argument for using Identities, "What do you do when the SSN was entered incorrectly?" You have the hassle of cascading updates. But you have this possibility with whatever field you choose as a PK.

-------------------
It's a SQL thing...

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2001-12-02 : 17:56:22

Why the bagging of cascading??

Sure it consumes resources for additional RI checking and it has a few limitations (cyclic). The extra time needed for this, IMHO is time well spent.

Yeah, yeah, large database are different. My definition of large is not total size but transactions per day!

I would rather have a natural over a surrogate key any day.
If the PK is a Identity, it is not a relation at all. You can't update it!

And people bitching about size! Bloody hell what's a Gig worth these days, $20 bucks?

And people bitching about how it is hardier to write on multiple joins! It is not, only more time consuming (by the way most of us are paid on time!). These same people never tell you that natural keys drastically reduce joins in SELECT stamtements. It most cases none are required at all!

DavidM
It gets windy at a thousand feet...."Tutorial D"

nr
SQLTeam MVY

12543 Posts

Posted - 2001-12-02 : 23:00:10

Yep SSN wouldn't be a good choice of identifier.
There is also a rumour that at some point in the 50's in the UK they made a mistake and issued duplicate national insurance numbers - now the system that issues the numbers should have this with a unique constraint.

>> what's a Gig worth these days
About 6 months of effort in some companies it seems.

>> Using the SSN as a primary key is illegal
??? there is a law against putting a constraint on a field in a database?

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2001-12-03 : 11:49:43

quote:
>> Using the SSN as a primary key is illegal
??? there is a law against putting a constraint on a field in a database?

<begin disclaimer>As I was told... </disclaimer> in the U.S. it's not legal for anyone other than the U.S. Government (or maybe just the Social Security Administration) to require citizens to give their SSN. No other organization is supposed to use this number as an identifying piece of information. (I think this was to ease everyone's fears that they were being "marked by the beast"). However, it is common practice (and convenience) for companies to ask for your SSN in order to do credit checks and who knows what else. Very rarely have I heard of anyone refusing to give it, but I know a couple of folks who don't. Mostly, it's just damn inconvenient to jump through all the hoops if you refuse to give this info.

-------------------
It's a SQL thing...

Nazim
A custom title

1408 Posts

Posted - 2001-12-05 : 00:33:30

Supposing if i make a integer field as a pk , how will i tackle concurrent inserts and still keep the integer field unique(in the sense that there shouldnt be inserts fails).

i have this problem. i applied this approches but it fails.

i read the max value of trnno(which is my pk) and add a record by adding 1 to it, but when there are more then one insert at the same time , it ends up failing one insert.

How will i tackle such a problem.

i thought of using identity column but i cant afford to miss any values and morever when some1 adds a record i have to display back trnno field(which is hard capturing for me on a insert).

Any suggestion

-------------------------

Edited by - sdnazim on 12/05/2001 00:34:49

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2001-12-05 : 02:22:10

sdNazim,

Look at using "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE"

DavidM
It gets windy at a thousand feet...."Tutorial D"

Nazim
A custom title

1408 Posts

Posted - 2001-12-05 : 04:20:17

Thanx David, that helped.

but therez another problem.

in my project as there are different types of transactions , so for keeping track of what should be the next pk value for the each transaction type , i maintain a table like
whose schema is serial_no(sno ,trntype, number) whenever am adding i read the number value and add 1 to it , to the corresponding trntype.
again am confused here on how to apply serilizability on concurrent updates. this is from an Asp page.
can u xplain me in little more detail

Thanx

-------------------------

Nazim
A custom title

1408 Posts

Posted - 2001-12-06 : 02:07:59

knock knock any1 there.

Did i post this Q in wrong forum........? ,i thought its concerning Primary key issue and i posted it.

-------------------------
Nazim

nr
SQLTeam MVY

12543 Posts

Posted - 2001-12-06 : 06:39:08

you can

begin tran
add one to next ID field
if error
rollback tran - return
insert rec with new id
if error
rollback tran - return
commit tran

or use a trigger to add one to the max id on insert.

These will both ensure the id is consecutive.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.

Nazim
A custom title

1408 Posts

Posted - 2001-12-07 : 06:25:43

nr i wrote similar code for inserting , but couple of times 2 transactions ended up having the same record no.( i didnt specify primary key on the field. coz, i wanted to make sure that it doesnt adds records with the same no ... now tht i have added primary key , it ended up in rejecting 1 user request..which is what i want to avoid).

how can i handle this better?

Thanx for the help.

-------------------------
Nazim -- "Success is when Preparedness meets Opportunity"

Edited by - sdnazim on 12/07/2001 06:26:43

nr
SQLTeam MVY

12543 Posts

Posted - 2001-12-07 : 06:52:26

Which one did you use?
It is important not to

begin tran
select @id = nextid + 1 from tbl
update tbl set nextid = @id
insert rec
commit tran

as this will not lock the table until the update.

begin tran
update tbl set @id = nextid + 1, nextid = nextid + 1
insert rec
commit tran
should work

as should
begin tran
update tbl set nextid = nextid + 1
select @id = nextid from tbl
insert rec
commit tran

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.

Mr.
Starting Member

1 Post

Posted - 2002-04-01 : 16:04:37

The problem I have with Autonumber fields is when they are used as foreign keys and specific values are tied into the business logic.

Let's say for example I have an application which has some secure features which I don't want every user to have access to. So I set up these tables:

SECURITY
--------
ID (autonumber)
LEVEL varchar(20)

MYUSERS
------
ID (autonumber)
NAME varchar(50)
SECURITYID integer

Then I add rows for "ADMINISTRATOR" and "USER" to the SECURITY table and update the user table accordingly assigning the id value to each user for the corresponding security level. Let's also assume "ADMINISTRATOR" has ID 1 and "USER" has ID 2.

I can create a simple stored proc to return whether or not the user is an administrator - now here is where things fall apart.

create proc IsAnAdmin(@UserID int) as
declare @SecID integer

Select @SecID = SecurityID
from MyUsers
where id = @UserID

if @SecID = 1
return 1
else
return 0

--end IsAnAdmin

As you can see the stored proc is using the actual VALUE of the ID. Using a numeric value makes things more difficult to figure out. Also, there is nothing special about the number 1 other than it represents the order in which "ADMINISTRATOR" was added to the database.

If this was part of an application which I decide to distribute then I must make sure that the client sets up the database and inserts the "ADMINISTRATOR" row before the "USER" row (and that the seed is set to 1).

Programmers use the term "Magic Number" to represent these kinds of static values. To avoid them most programming languages support enumeration which allows using a string of text to refer to the name instead of the number (e.g. adOpenStatic, adLockOptimistic versus 3,3). If the number values change the programmers do not have to look through all the code and change each value because of the alias(enumeration).

Following this line of thought I should change the MYUSERS.SECURITYID field to something like SECURITY_LEVEL varchar(20), fill in the actual data instead and rewrite the sql statement to read:

Select @IsAnAdmin = count(*)
from MyUsers
where SECURITY_LEVEL = "ADMINISTRATOR"

return IsAnAdmin

I feel much more comfortable with checking for the string "ADMINISTRATOR" than the number 1. The number 1 will still give me the "ADMINISTRATOR" row yet now I can blow it away and re-add it (to get ID 3) without any ill affects.

Apollois
Starting Member

49 Posts

Posted - 2002-05-09 : 14:54:58

I must say that it is apparent that Mr. Volk is not well qualified in relational database design. He makes statements that are clearly at odds with established, professionally accepted guidelines for DB design.

The main purpose of a primary key is to provide a unique identifier for relating records in one table with records in another table. Good design dictates that the PK should never be exposed to the end-user. It is NOT designed to be used for helping the end-user identify the data. Mr. Volk clearly does not understand this.

This article has so many flaws that it should be withdrawn from this web site, as it is more hurtful than helpful, especially to DB designers with limited experience.

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2002-05-09 : 18:57:37

quote:

I must say that it is apparent that Mr. Volk is not well qualified in relational database design.

Wow, do you really believe that, or are you just trolling? You follow up that statement with more opinion, so it sounds like you're sincere in your belief. I just have to jump in here and say that I believe Rob Volk to be one of the most knowledgeable RDBMS guys around who isn't out trying to pump up his own book.

Granted, the position I'm coming from is one of a definite preference toward application and a bit of disdain for theory. But, I am VERY interested in your "established, professionally accepted guidelines for DB design". I consider myself to be among the most professional in my approach to application development, including DB Design. Would you please post a reference, preferably a link to an authoritative source such as a highly respected book on relational design? If you could include specific chapters or page numbers, that would be a great help.

quote:
Good design dictates that the PK should never be exposed to the end-user.

Now granted, I haven't taken a lot of design classes, but I have worked with some excellent designers, and I have NEVER heard this claim. Are you saying that, in fact, we should go to effort to actually HIDE the PK from the end-user? It sounds that way when you use words like NEVER.

While I have a bit of an affinity toward using IDENTITY fields (yes, I'm actually disagreeing with Mr. Volk's article here) I do believe that Rob has put forth a good argument and people should seriously consider it. I'd say this is an excellent article and should be retained as a good counter-balance to the auto-number craze. And the longer I'm at this site, the more I hear people asking about "how to delete duplicates" which Rob points out in his article.

Edited by - AjarnMark on 05/09/2002 19:00:50

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2002-05-09 : 19:21:30

quote:

I must say that it is apparent that Mr. Volk is not well qualified in relational database design. He makes statements that are clearly at odds with established, professionally accepted guidelines for DB design.

This coming from a guy who did his website in Microsoft Word

Damian

Previous Page&nsp; Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources