SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Primary Key - What datatype to use??
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

Surfer513
Starting Member

29 Posts

Posted - 12/23/2010 :  21:10:08  Show Profile  Reply with Quote
What is the best datatype to use as a primary key in a table? I know that a lot of people like to use a constraint of type int, but that really limits the table to 2,147,483,647 records. Doesn't it? I mean, what happens when the table grows over 2 billion records?

What's the advice on this sort of thing to designing a table that could potentially extend beyond the limits of the int datatype?

Thanks in advance. :)

russell
Pyro-ma-ni-yak

USA
5072 Posts

Posted - 12/23/2010 :  22:04:29  Show Profile  Visit russell's Homepage  Reply with Quote
bigint.

Your app requires more than 2 billion records in a single table?

By the way, the limit of int is -2,147,483,647 through 2,147,483,647 -- 4 billion + values
Go to Top of Page

russell
Pyro-ma-ni-yak

USA
5072 Posts

Posted - 12/23/2010 :  22:05:57  Show Profile  Visit russell's Homepage  Reply with Quote
Also, can't tell you the best Primary Key unless we know more about your data. Maybe it's a composite of several columns.

But in general, int is a solid choice.

Like so many things, the answer is -- it depends...
Go to Top of Page

Sachin.Nand
Flowing Fount of Yak Knowledge

2937 Posts

Posted - 12/24/2010 :  00:17:29  Show Profile  Reply with Quote
To add to this int is a good choice if you have a clustered index on it.

PBUH

Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 12/24/2010 :  11:12:50  Show Profile  Reply with Quote
I see what you're saying. As far as the data, there are 3 fields (besides the pk): user name (nvarchar), billing (decimal[10,2]), and another nvarchar that is not specific to the user or the record.

I figured the capabilities of int was divided in half when you do identity(1,1), so I can't take advantage of the lower half (negative) numbers. And I didn't think there was any way to do an unsigned int in SQL. Am I wrong?
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
36985 Posts

Posted - 12/24/2010 :  11:14:44  Show Profile  Visit tkizer's Homepage  Reply with Quote
Int supports negative numbers and so does identity. Identity(-2,147,483,647, 1). Int uses 4 bytes for storage, and bigint uses 8 bytes. So be very careful with this. If you don't need more than 4 billion identity values, then don't use bigint.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 12/24/2010 :  11:20:28  Show Profile  Reply with Quote
If you are going to have 4 billion user records (or even 2 billion for that matter) you have a lot more to think about then just what datatype to use for the PK !!!!!
Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 12/24/2010 :  12:30:26  Show Profile  Reply with Quote
Thank you very much for the input. Kristen, like what are some other considerations? It is a fairly simple database structure. Can you point me in the right direction as far as other aspects to consider?

Thank you!
Go to Top of Page

russell
Pyro-ma-ni-yak

USA
5072 Posts

Posted - 12/24/2010 :  13:19:07  Show Profile  Visit russell's Homepage  Reply with Quote
If you insert 10 records a second non-stop for 13 years you still wouldn't run out of numbers using an INT.

What are you doing that requires bigint?

Some other considerations if you really need that many are:
Storage, Disk layout, size and speed.
CPU, # and speed
RAM
Partitioning and indexing
Backups
Recovery plan

To name a few.

Edited by - russell on 12/24/2010 13:20:06
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 12/24/2010 :  13:54:13  Show Profile  Reply with Quote
"what are some other considerations?"

How soon will you have 2 billion records in that table? Unless it is anytime-soon further discussion is rather moot. If you WILL have 2 billions rows in a single table then you need lots of experience and knowledge, and I would recommend retaining a consultant, or employing someone with that knowledge - applications which handle that sort of data volume are not just thrown together on a Sunday afternoon!.
Go to Top of Page

Surfer513
Starting Member

29 Posts

Posted - 12/24/2010 :  20:32:33  Show Profile  Reply with Quote
Kristen, thank you for the advice but I will not be having that many records in the table soon, if at all. I am a developer by nature and I just hate limiting myself, and I know that the wrong data type selection is the quickest way to get there.

I will learn as my db grows. I love SQL Server and am an aspiring db developer. I will learn the hard way! I guess I just overreacted with the pk data type choice.

Thank you, all!
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 12/25/2010 :  05:49:55  Show Profile  Reply with Quote
Best to restrict the datatype to the minimum deemed necessary - no point having massive columns for Name and Address etc., restrict them to something reasonable. Same with the ID - INT is plenty except in exceptional, massive, databases
Go to Top of Page

BruceT
Yak Posting Veteran

USA
77 Posts

Posted - 12/28/2010 :  14:27:33  Show Profile  Reply with Quote
Another thing to be careful of, if you need to do replication identity columns as primary keys can cause problems.
Go to Top of Page

jcelko
Esteemed SQL Purist

USA
547 Posts

Posted - 12/28/2010 :  16:50:30  Show Profile  Visit jcelko's Homepage  Reply with Quote
>> What is the best data type to use as a PRIMARY KEY in a table?<<

This is a reallllllly bad question. This is like asking what the best color of ink to write code is; does red or blue ink make it run faster?

I stopped reading at this point and made a bet with myself that in the rest of the post you would confuse rows with records. Then I scrolled down yo:

>> I know that a lot of people like to use a constraint of type INTEGER, but that really limits the table to 2,147,483,647 records [sic]. Doesn't it? I mean, what happens when the table grows over 2 billion records [sic]? <<

Yep! You have absolutely no idea how RDBMS works. Read a book, any book. A PRIMARY KEY is a subset of attributes of an entity that uniquely identifies each occurrence of the entity. VINs identify vehicles; ISBNs identify books. UPS bar codes identify retails goods.

What you want is a PHYSICAL record number in a file system for PHYSICAL access control. This has nothing to do with the concpet of a PRIMARY KEY.

The idea of a magical, universal number to identify anything in creation is Khabalah numerology and not math or science.

When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.

After that, there is a natural key in the data, such as (longitude, latitude), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.

Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.

So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.

A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.


--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Go to Top of Page

spockmonster
Starting Member

USA
2 Posts

Posted - 01/09/2013 :  13:19:57  Show Profile  Reply with Quote
You might accumulate more than 2 billion rows over time, even if many of those rows have been purged. For example, if you have 10 million customers who, over 10 years time, create 1,000 transactions each, at the end of 10 years your TransactionId will be 10,000,000,000 (10 Billion). Your system will likely purge transactions every year, but your Primary Key will exceed the range of a 32-bit smallint.

Go to Top of Page

spockmonster
Starting Member

USA
2 Posts

Posted - 01/09/2013 :  14:17:18  Show Profile  Reply with Quote
Oooo, I had a really nice response, but I forgot my username/password, and so when I Posted, it said "Invalid username" and when I went back to the "edit response" page, my text was all gone.

so all of you readers don't get to see the advice I contributed because of the way this board's server code is written (poorly).
Go to Top of Page

ScottPletcher
Constraint Violating Yak Guru

USA
381 Posts

Posted - 01/09/2013 :  17:26:30  Show Profile  Reply with Quote
If you believe an int would overflow, use a bigint. Not a big deal.

But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.

Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same.
Go to Top of Page

Jeff Moden
Aged Yak Warrior

USA
649 Posts

Posted - 01/09/2013 :  20:15:27  Show Profile  Reply with Quote
quote:
Originally posted by ScottPletcher

If you believe an int would overflow, use a bigint. Not a big deal.

But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.

Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same.


Considering that the best recommendations for a clustered index are "narrow, unique, and ever increasing", why are you suggesting to "never" use identity for the clustering index? The real fact is that since the clustered index columns are added to every nonclustered index that doesn't have them specifically listed, you'll also understand that it almost never destroys performance and, in fact, is strongly recommended on larger tables for the sake of performance.

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".

First step towards the paradigm shift of writing Set Based code:
"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

When writing schedules, keep the following in mind:
"If you want it real bad, that's the way you'll likely get it."
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.23 seconds. Powered By: Snitz Forums 2000