Author |
Topic |
Surfer513
Starting Member
29 Posts |
Posted - 2010-12-23 : 21:10:08
|
What is the best datatype to use as a primary key in a table? I know that a lot of people like to use a constraint of type int, but that really limits the table to 2,147,483,647 records. Doesn't it? I mean, what happens when the table grows over 2 billion records?What's the advice on this sort of thing to designing a table that could potentially extend beyond the limits of the int datatype?Thanks in advance. :) |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-23 : 22:04:29
|
bigint.Your app requires more than 2 billion records in a single table?By the way, the limit of int is -2,147,483,647 through 2,147,483,647 -- 4 billion + values |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-23 : 22:05:57
|
Also, can't tell you the best Primary Key unless we know more about your data. Maybe it's a composite of several columns.But in general, int is a solid choice.Like so many things, the answer is -- it depends... |
|
|
Sachin.Nand
2937 Posts |
Posted - 2010-12-24 : 00:17:29
|
To add to this int is a good choice if you have a clustered index on it.PBUH |
|
|
Surfer513
Starting Member
29 Posts |
Posted - 2010-12-24 : 11:12:50
|
I see what you're saying. As far as the data, there are 3 fields (besides the pk): user name (nvarchar), billing (decimal[10,2]), and another nvarchar that is not specific to the user or the record.I figured the capabilities of int was divided in half when you do identity(1,1), so I can't take advantage of the lower half (negative) numbers. And I didn't think there was any way to do an unsigned int in SQL. Am I wrong? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-24 : 11:20:28
|
If you are going to have 4 billion user records (or even 2 billion for that matter) you have a lot more to think about then just what datatype to use for the PK !!!!! |
|
|
Surfer513
Starting Member
29 Posts |
Posted - 2010-12-24 : 12:30:26
|
Thank you very much for the input. Kristen, like what are some other considerations? It is a fairly simple database structure. Can you point me in the right direction as far as other aspects to consider?Thank you! |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-24 : 13:19:07
|
If you insert 10 records a second non-stop for 13 years you still wouldn't run out of numbers using an INT.What are you doing that requires bigint?Some other considerations if you really need that many are:Storage, Disk layout, size and speed.CPU, # and speedRAMPartitioning and indexingBackupsRecovery planTo name a few. |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-12-24 : 13:54:13
|
"what are some other considerations?"How soon will you have 2 billion records in that table? Unless it is anytime-soon further discussion is rather moot. If you WILL have 2 billions rows in a single table then you need lots of experience and knowledge, and I would recommend retaining a consultant, or employing someone with that knowledge - applications which handle that sort of data volume are not just thrown together on a Sunday afternoon!. |
|
|
Surfer513
Starting Member
29 Posts |
Posted - 2010-12-24 : 20:32:33
|
Kristen, thank you for the advice but I will not be having that many records in the table soon, if at all. I am a developer by nature and I just hate limiting myself, and I know that the wrong data type selection is the quickest way to get there.I will learn as my db grows. I love SQL Server and am an aspiring db developer. I will learn the hard way! I guess I just overreacted with the pk data type choice.Thank you, all! |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-12-25 : 05:49:55
|
Best to restrict the datatype to the minimum deemed necessary - no point having massive columns for Name and Address etc., restrict them to something reasonable. Same with the ID - INT is plenty except in exceptional, massive, databases |
|
|
BruceT
Yak Posting Veteran
78 Posts |
Posted - 2010-12-28 : 14:27:33
|
Another thing to be careful of, if you need to do replication identity columns as primary keys can cause problems. |
|
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2010-12-28 : 16:50:30
|
>> What is the best data type to use as a PRIMARY KEY in a table?<<This is a reallllllly bad question. This is like asking what the best color of ink to write code is; does red or blue ink make it run faster?I stopped reading at this point and made a bet with myself that in the rest of the post you would confuse rows with records. Then I scrolled down yo: >> I know that a lot of people like to use a constraint of type INTEGER, but that really limits the table to 2,147,483,647 records [sic]. Doesn't it? I mean, what happens when the table grows over 2 billion records [sic]? <<Yep! You have absolutely no idea how RDBMS works. Read a book, any book. A PRIMARY KEY is a subset of attributes of an entity that uniquely identifies each occurrence of the entity. VINs identify vehicles; ISBNs identify books. UPS bar codes identify retails goods. What you want is a PHYSICAL record number in a file system for PHYSICAL access control. This has nothing to do with the concpet of a PRIMARY KEY. The idea of a magical, universal number to identify anything in creation is Khabalah numerology and not math or science. When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today. After that, there is a natural key in the data, such as (longitude, latitude), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong. Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes. So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain. A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique. --CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
|
|
spockmonster
Starting Member
2 Posts |
Posted - 2013-01-09 : 13:19:57
|
You might accumulate more than 2 billion rows over time, even if many of those rows have been purged. For example, if you have 10 million customers who, over 10 years time, create 1,000 transactions each, at the end of 10 years your TransactionId will be 10,000,000,000 (10 Billion). Your system will likely purge transactions every year, but your Primary Key will exceed the range of a 32-bit smallint. |
|
|
spockmonster
Starting Member
2 Posts |
Posted - 2013-01-09 : 14:17:18
|
Oooo, I had a really nice response, but I forgot my username/password, and so when I Posted, it said "Invalid username" and when I went back to the "edit response" page, my text was all gone. so all of you readers don't get to see the advice I contributed because of the way this board's server code is written (poorly). |
|
|
ScottPletcher
Aged Yak Warrior
550 Posts |
Posted - 2013-01-09 : 17:26:30
|
If you believe an int would overflow, use a bigint. Not a big deal.But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same. |
|
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2013-01-09 : 20:15:27
|
quote: Originally posted by ScottPletcher If you believe an int would overflow, use a bigint. Not a big deal.But NEVER by default use the PK and/or an identity column as the clustering index. NEVER!! That's is THE most common mistake made, and it can destroy your performance.Analyze the data to select the best clustering key. And remember that the PK and the clustering key do NOT have to be the same.
Considering that the best recommendations for a clustered index are "narrow, unique, and ever increasing", why are you suggesting to "never" use identity for the clustering index? The real fact is that since the clustered index columns are added to every nonclustered index that doesn't have them specifically listed, you'll also understand that it almost never destroys performance and, in fact, is strongly recommended on larger tables for the sake of performance.--Jeff Moden RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".First step towards the paradigm shift of writing Set Based code:"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." When writing schedules, keep the following in mind:"If you want it real bad, that's the way you'll likely get it." |
|
|
|