Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Primary Key - What datatype to use??

Author  Topic 

Surfer513
Starting Member

29 Posts

Posted - 2010-12-23 : 21:10:08
What is the best datatype to use as a primary key in a table? I know that a lot of people like to use a constraint of type int, but that really limits the table to 2,147,483,647 records. Doesn't it? I mean, what happens when the table grows over 2 billion records?

What's the advice on this sort of thing to designing a table that could potentially extend beyond the limits of the int datatype?

Thanks in advance. :)

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-23 : 22:04:29
bigint.

Your app requires more than 2 billion records in a single table?

By the way, the limit of int is -2,147,483,647 through 2,147,483,647 -- 4 billion + values
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-23 : 22:05:57
Also, can't tell you the best Primary Key unless we know more about your data. Maybe it's a composite of several columns.

But in general, int is a solid choice.

Like so many things, the answer is -- it depends...
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-12-24 : 00:17:29
To add to this int is a good choice if you have a clustered index on it.

PBUH

Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 2010-12-24 : 11:12:50
I see what you're saying. As far as the data, there are 3 fields (besides the pk): user name (nvarchar), billing (decimal[10,2]), and another nvarchar that is not specific to the user or the record.

I figured the capabilities of int was divided in half when you do identity(1,1), so I can't take advantage of the lower half (negative) numbers. And I didn't think there was any way to do an unsigned int in SQL. Am I wrong?
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-12-24 : 11:14:44
Int supports negative numbers and so does identity. Identity(-2,147,483,647, 1). Int uses 4 bytes for storage, and bigint uses 8 bytes. So be very careful with this. If you don't need more than 4 billion identity values, then don't use bigint.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-24 : 11:20:28
If you are going to have 4 billion user records (or even 2 billion for that matter) you have a lot more to think about then just what datatype to use for the PK !!!!!
Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 2010-12-24 : 12:30:26
Thank you very much for the input. Kristen, like what are some other considerations? It is a fairly simple database structure. Can you point me in the right direction as far as other aspects to consider?

Thank you!
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-24 : 13:19:07
If you insert 10 records a second non-stop for 13 years you still wouldn't run out of numbers using an INT.

What are you doing that requires bigint?

Some other considerations if you really need that many are:
Storage, Disk layout, size and speed.
CPU, # and speed
RAM
Partitioning and indexing
Backups
Recovery plan

To name a few.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-24 : 13:54:13
"what are some other considerations?"

How soon will you have 2 billion records in that table? Unless it is anytime-soon further discussion is rather moot. If you WILL have 2 billions rows in a single table then you need lots of experience and knowledge, and I would recommend retaining a consultant, or employing someone with that knowledge - applications which handle that sort of data volume are not just thrown together on a Sunday afternoon!.
Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 2010-12-24 : 20:32:33
Kristen, thank you for the advice but I will not be having that many records in the table soon, if at all. I am a developer by nature and I just hate limiting myself, and I know that the wrong data type selection is the quickest way to get there.

I will learn as my db grows. I love SQL Server and am an aspiring db developer. I will learn the hard way! I guess I just overreacted with the pk data type choice.

Thank you, all!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-25 : 05:49:55
Best to restrict the datatype to the minimum deemed necessary - no point having massive columns for Name and Address etc., restrict them to something reasonable. Same with the ID - INT is plenty except in exceptional, massive, databases
Go to Top of Page

BruceT
Yak Posting Veteran

78 Posts

Posted - 2010-12-28 : 14:27:33
Another thing to be careful of, if you need to do replication identity columns as primary keys can cause problems.
Go to Top of Page

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2010-12-28 : 16:50:30
>> What is the best data type to use as a PRIMARY KEY in a table?<<

This is a reallllllly bad question. This is like asking what the best color of ink to write code is; does red or blue ink make it run faster?

I stopped reading at this point and made a bet with myself that in the rest of the post you would confuse rows with records. Then I scrolled down yo:

>> I know that a lot of people like to use a constraint of type INTEGER, but that really limits the table to 2,147,483,647 records [sic]. Doesn't it? I mean, what happens when the table grows over 2 billion records [sic]? <<

Yep! You have absolutely no idea how RDBMS works. Read a book, any book. A PRIMARY KEY is a subset of attributes of an entity that uniquely identifies each occurrence of the entity. VINs identify vehicles; ISBNs identify books. UPS bar codes identify retails goods.

What you want is a PHYSICAL record number in a file system for PHYSICAL access control. This has nothing to do with the concpet of a PRIMARY KEY.

The idea of a magical, universal number to identify anything in creation is Khabalah numerology and not math or science.

When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.

After that, there is a natural key in the data, such as (longitude, latitude), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.

Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.

So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.

A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.


--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Go to Top of Page

spockmonster
Starting Member

2 Posts

Posted - 2013-01-09 : 13:19:57
You might accumulate more than 2 billion rows over time, even if many of those rows have been purged. For example, if you have 10 million customers who, over 10 years time, create 1,000 transactions each, at the end of 10 years your TransactionId will be 10,000,000,000 (10 Billion). Your system will likely purge transactions every year, but your Primary Key will exceed the range of a 32-bit smallint.

Go to Top of Page

spockmonster
Starting Member

2 Posts

Posted - 2013-01-09 : 14:17:18
Oooo, I had a really nice response, but I forgot my username/password, and so when I Posted, it said "Invalid username" and when I went back to the "edit response" page, my text was all gone.

so all of you readers don't get to see the advice I contributed because of the way this board's server code is written (poorly).
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2013-01-09 : 17:26:30
If you believe an int would overflow, use a bigint. Not a big deal.

But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.

Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same.
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2013-01-09 : 20:15:27
quote:
Originally posted by ScottPletcher

If you believe an int would overflow, use a bigint. Not a big deal.

But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.

Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same.


Considering that the best recommendations for a clustered index are "narrow, unique, and ever increasing", why are you suggesting to "never" use identity for the clustering index? The real fact is that since the clustered index columns are added to every nonclustered index that doesn't have them specifically listed, you'll also understand that it almost never destroys performance and, in fact, is strongly recommended on larger tables for the sake of performance.

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".

First step towards the paradigm shift of writing Set Based code:
"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

When writing schedules, keep the following in mind:
"If you want it real bad, that's the way you'll likely get it."
Go to Top of Page
   

- Advertisement -