SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Identity and Primary Keys
 New Topic  Reply to Topic
 Printer Friendly
Previous Page | Next Page
Author Previous Topic Topic Next Topic
Page: of 8

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:31:06  Show Profile  Visit quazibubble's Homepage  Reply with Quote
These are realistic, real-world requirements, that are in no way unnatural or cleverly created to set up a straw-man argument. Now let's whittle an UNK out of this:

* SSN and DL are good starting candidates, but we aren't guaranteed to have them. Furthermore, they are not guaranteed to be unique, per the business requirements.

* The Customer ID defined by the business is unique, but to avoid having two columns that must be unique, the Customer ID itself is the very thing we are trying to define here. We will give the customer their ID as soon as we figure out what it is.

* Since a person alone is not necessarily a unique customer, before we can even worry about an UNK/CustomerID, we must first identify a combination of person, plus other attributes that define a unique record (as required by principles of rational design). We'll make it a rule that a person + street address + signup date defines a unique record. We'll make the simplifying assumption that a unique customer can only sign up and cancel once per day. Getting a unique person could still be a problem, however. We will make a further simplifying assumption that if multiple people of the same name exist at the same address (an office building perhaps), and sign up on the same day, it is up to them to tell the CSR how to further distinguish their name, when the CSR realizes the insert failed.

* So now we have <full customer name><address><signup date>. This might be a huge value. We need to shorten it. We can removing spaces and formatting characters from everything, and make it upper-case. Going further, we can also only use the first X characters from each atomic address and name field, but this could easily compromise uniqueness. We're going to have to live with that.

* To further shorten our synthetic UNK, we could perform a hash on the full result, which would be arbitrarily smaller. The hash uniqueness is inversely proportional to it's size delta from the original. However, this will destroy the human readability and meaning, a primary objective of an UNK.

* We could also consider the full concatenated value to be a base-36 number, and then convert it to a base-62 number, in order to shorten it a little more. (Just as 0-9 and A-F, for a total of 16 discreet digits, is symbolically used to represent a base-16 number [hexadecimal], we can consider 0-9 and A-Z as symbolically representing a base-36 number; then convert that to a base-62 number, represented by 0-9, A-Z, and a-z.) The result will be somewhat smaller, and is guaranteed to be exactly as unique as the original string. But again, this would destroy the human readability and meaning required of an UNK (although it is reversible).

We are realizing that we must make some tough choices, between:

1) A multi-column PK, at minimum using 3-columns (or more since name and address are multiple fields).

2) A very large synthetic single-column UNK, using name, address, and signup date.

3) A shorter hash or numeric base-62 representation.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:32:01  Show Profile  Visit quazibubble's Homepage  Reply with Quote
So let's assume we decided on one of the choices above, and have simply accepted it's inherent drawbacks as a fact of life. Unfortunately, we are not done there! Nosirree, nothing is that simple. If you leave it at that, the system is still broken, you just don't know it yet. Think. Look into the future. What if, long after you leave, one or more of these things happen?

* The database schema gets a freshen-up. Address attributes are moved into their own 1:M child entity, as is the fad with many shrink-wrapped enterprise data solutions.

* The company goes global, and wants their application and data localizable to different cultures? You're in a global world of hurt with UNKs.

* The company buys another company. The purchased company uses the same algorithm for generating Customer IDs. The new combined CEO quite naturally wants to merge customer information. You will likely have duplicates. (This is not at all far fetched, considering YOU might have been the one who designed Company 2's schema and PK algorithms, after leaving Company 1. Furthermore, if your UNK solution is so damned good and bulletproof, it would stand to reason other companies would use it as a "best practice". How smart is it to design a PK algorithm that depends on it being so quirky no other company would use the same one?)

* Company direction and fundamental business rules changed. We are now strictly an e-commerce site, we no longer mail anything, and don't care what your physical address is. In addition, we've grown, and we now spam so far and wide that the likelihood of multiple people with the same name signing themselves up on the web on the same day is highly likely. What to do then? We don't want to scrap our system. Some disgruntled programmer is going to have to populate the address and/or date portion of the UNK with junk data, or expand the column to accommodate a more granular date/time, or re-think the UNK algorithm, or just re-architect system altogether to use a true system-generated PK once and for all.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:32:33  Show Profile  Visit quazibubble's Homepage  Reply with Quote
So, you decide that, while you are still attached to the idea of an UNK, you need to isolate your relational model a little bit from the topsy-turvy world of ever-changing business requirements--not only in order to maintain the integrity of your design, but to even allow the business to change their rules in the first place. One obvious solution to consider is to use a high-resolution timestamp.

For example, your UNK could be the current date/time, in the format YYYYMMDDHHNNSSSS. Not too bad! It's system generated, does not depend on whimsical business rules, yet is still "natural" and descriptive of the entity (more or less, as it's insertion timestamp). But with multiple clients, or multithreaded processes on a scaled up database server, this still may not be enough.

Furthermore, it does not address the ability to extend the system sometime in the near or distant future to support replicated nodes, disconnected mobile users, and/or merged repositories. Although these are currently non-requirements (stated resolutely by the CIO himself), systems have a nagging way of lingering far longer than intended, outliving the tenure of it's original owners, and being extended well beyond it's original scope. Let's be a hero to our future brothers-in-arms, and not make life hell for them. We should always assume any and all of those things COULD happen, and design for it from the start, if we can do it without too much overhead in the present. So here we go:

* The timestamp is very useful as a highly unique element. This will form the backbone of our UNK.

* To support replicated nodes, disconnected mobile clients, and merged repositories, let's add a database identifier to the UNK. This will prevent the same timestamps from stepping on each other across multiple databases. The actual management of generating these unique database identifiers can be as simple as a single network-accessible Excel spreadsheet serving as an enumeration bank.

* To support the case where multiple users are hitting one scaled up database server, why not be a sport and some kind of user identifier to the UNK. This can be a user ID, or if that is actually what we are trying to generate, can be the windows login even though we don't store that elsewhere in the database. Since the combination of the timestamp and database ID provides most of the uniqueness, it is not terribly important that a user's login never changes.

* Our users might complain if they are not able to log in more than once from multiple machines. Or axiomatically, one user might log in to multiple browser sessions, hitting more than one server in a web farm. So lets add a machine identifier to the UNK (client or web server). This can be the OS machine name, the private IP address even if DHCP, or best yet a MAC address.

We could go on, adding thread IDs for client /web and servers, but we've already gotten to an exceptionally unique state of being. We can continue on and eliminate the remaining 0.000001% of cases where the unique constraint is still violated, but we've reached diminishing returns, and we must still be able to accommodate the occasional dupe without exploding.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:34:29  Show Profile  Visit quazibubble's Homepage  Reply with Quote
So we've determined that our new UNK is composed of a date/time stamp, a database identifier, a machine identifier, and a user identifier. Let's make it case-insensitive too. Our formula is now:

UCASE(<Timestamp><User><Client machine><DB instance>)

Are you with me so far? Doesn't sound half unreasonable, does it. If not, re-read the previous section and tell me exactly what part is unreasonable.

We have a useful UNK that is not subject to conflict with business rules. (One could argue the user and client can change. But in this case, the primary data of relevance is the timestamp, we don't really care if the user and client names change someday, it won't hurt uniqueness, and it's still meaningful. This is much different than saying it's OK for a username that solely composes a UNK to change.)

Now, how to represent it? Since we want the UNK to be meaningful (to fulfill the definition of an UNK), we must be able parse it out to it's original constituents if need be. If we just run variable-length strings together, we will lose that ability. So we can achieve this by either using fixed-width blocks for each portion of the key, or by using an arbitrary delimiter. For example:

Starting with:

Timestamp: "2002-11-23 17:56.0075"
User: "Landry21.BobThomas"
Client machine or web server: "66.145.1.10"
Database instance: 8

We could strip out unnecessary data, pad the IP address so that it's octet divisions are not lost, convert everything to upper case, and concatenate them to:

"2002112317560075.LANDRY21BOBTHOMAS.066145001010.8"

With me so far? Is there anything you don't like so far, other than the key being way too long? (Don't worry, we'll fix that.)

Remember our discussion of the staggering waste involved with storing only 0-9, A-Z--for a total of 36 possibilities--in a character byte, each of which can represent 255 possibilities? In our case, we have 47 significant characters offering up 1.28x10^113 possible values, but are only able to represent 1.4x10^73 distinct values (which is nothing to scoff at, being only a few orders of magnitude less than the estimated number of subatomic particles in the known universe). We can do something to either regain some of those discreet values, or shorten the field. (Since there is not enough time in or matter in creation to exhaust these values, let's just shorten the field.) Although an irreversible hash would be a FANTASTIC choice when starting with such a large number, unfortunately we've already decided against that before, due to it's...well, irreversibility (which would clearly violate the UNKers' stated creed).

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:37:50  Show Profile  Visit quazibubble's Homepage  Reply with Quote
Instead of this somewhat lengthy string, let's represent the UNK with a slightly more compact 62-bit number (represented with SQL-friendly 0-9, A-Z, and a-z). To further maximize space, we'll start with each block's most efficient natural base, for example:
* Numeric blocks such as stripped timestamp: alpha to base-10 to base-62.
* Alphanumeric blocks: alpha to base-36 to base-62.
* Hexadecimal blocks (such as a MAC address): alpha to base-16 to base-62.
* Binary blocks (such as IP address): alpha to base-2 to base-62.
These operations are fully reversible to meet most of the UNKers' requirements for an UNK.

We will use the following logic to construct our key. This still wastes space in a 16-bit character array, but is more compact and efficient than straight alphanumeric. Here we go:

<Timestamp>:
YYYY-MM-DD HH:MM:SS.00
Stripped and converted from alpha to base-10 to base-62
ex: "2002-11-23 17:56.0075" = "9AWJE1Gkl"

<User>:
user name
first 9 characters, stripped, UCASE'ed, then from base-36 to base-62
ex: "Landry21.BobThomas" = "TNbIxY6"

<Client machine> one of the following (in order of preference):
MAC address
last 5 hexets from string to base-16 to base-62
ex: "9F:A6:5B:C2:10" = "0C4SeqgK"
IP address
last 4 octets converted from string to binary to base-62
ex: "66.145.1.10" = "1DZyV0"
machine name
last 9 characters UCAS'Ed, then from base-36 to base-62
ex: "CDesktop01" = "Ak5DROQD"

<Database instance>:
base-10 number converted to base-62
ex: 8 = 8

For all that, we go from:

Timestamp: "2002-11-23 17:56.0075"
User: "Landry21.BobThomas"
Client machine or web server: "66.145.1.10"
Database instance: 8

To:

"9AWJE1Gkl.TNbIxY6.1DZyV0.8"

Much better. Granted, we did wind up truncating the user name block, but we risked compromising uniqueness by the most negligible, and quite acceptable factor. (Note: to shrink the user name block, it would have been better to hash it rather than truncate it, but again, at the expense of irreversibility and voiding the UNK creed).

What we wound up with is a more compact string representation, no bigger than a typical full name, and MUCH smaller than many of the ridiculous suggested UNKs I've seen suggested on this forum. It is virtually guaranteed to be unique across time and space, without using awkward multiple foreign keys, and it preserves the reversibility and distinctness of each original attribute (except for a shortened user name). Unlike an Identity or a less unique UNK, this key can be merged, replicated, syncopated, synchronized, quadrasized, and partially hydrogenized, across nearly every inch of space and millisecond of time.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:39:03  Show Profile  Visit quazibubble's Homepage  Reply with Quote
Although the formula can be easily and quickly reversed to show the original attributes, I promise on your mother's sullied virtue that you'll find absolutely no reason to do such a thing in the real world. You will come to realize, as countless others have before us, that you don't need to show a damn key to explain anything to you or your users. You just need it to build relationships under the covers. That's it, and that's all.


With this simple demonstration, the mathematical proof, and the points made prior, you can finally discard the ignorant notions that:

* An UNK will magically add value and meaning to the user experience, in spite of all demonstrated evidence to the contrary.

* Performance overhead will somehow be improved, in direct violation of mathematics and the physics of disk I/O, as I've just demonstrated.

* You will somehow be able to maintain the application as business rules constantly change underneath you, contrary to what I've pointed out.

But again, if you are still unwilling to let go, the meaningful data is available for you to use with the cumbersome formula I have provided.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:41:08  Show Profile  Visit quazibubble's Homepage  Reply with Quote
But w..a..i..t.. a minute. Hold on! Hold on just a galldurned minute. This is beginning to reek remarkably like "reinventing the wheel". I hate that smell. Sharp-witted readers will realize this gradual evolution, and it's final destination, are not new concepts. The unique nature of very large values have been recognized for hundreds of years, and is now built in to SQL Server and Windows in the form of the GUID. As you probably know, a GUID is a 128-bit "globally unique" value. It is the hashed result of many of the same things we've just tinkered with, such as MAC address and time. But many people dismiss using a GUID because they think it's too big. Big mistake!

DO NOT LET THE DISPLAYED LENGTH OF A GUID FOOL YOU. It is much, much smaller than it looks. Let's take a moment to compare:

UNK:

We have accepted that we require a certain relative minimum length to gain enough uniqueness, which is still quite attractive compared to other ridiculous UNKs proposals floating around. It is displayed with 26 character positions. It contains 23 significant digits of 62 possible values each--which, while not taking full advantage of it's 8-bit character data type--can be stored, displayed, and queried using simple strings and friendly characters.

Total space consumed = 208 bits
Possible unique values = 1.68x10^41

GUID:

Displayed with 38 character positions--a good deal wider than our UNK. But don't be fooled (I can't tell you how many people are). Take a good, close look. It contains 32 significant digits of only 4 bits each. This yields:

Total space consumed = 128 bits
Possible unique values = 1.84x10^19

(continued...)

Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 12/16/2002 :  23:42:52  Show Profile  Visit robvolk's Homepage  Reply with Quote
quazibubble-

You filled out the notebooks for the movie Seven, didn't you?

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:43:32  Show Profile  Visit quazibubble's Homepage  Reply with Quote
Lessons to be learned:

* A GUID allows for 1.1x10^-22 times fewer values than our UNK (but still mind-bogglingly large), while requiring 38% less space.

* A GUID consumes exactly as many bits as it does to store as the following string:
"IBM INCORPORATED"

* Previously I've mentioned the fact that nothing is wasted with Identity columns, and that if it weren't for the fact that they auto-increment and are limited to 4.3 billion unique values, they would be the most efficient primary key possible, with perfect uniqueness "bang" for your column width "buck". You should now realize that GUIDs come very close to the same promise as well. They are small, pack an overwhelming number of unique values, and waste not a single bit (unlike hugely wasteful UNKs). (Although it's quite true that by the time you approached the exhaustion of it's available pool of values [say, if you were indexing every subatomic particle in the solar system], you would start getting an unreasonable number of duplicate key errors.)

* 128-bit GUIDs are only 4 times the size of the int data type. Yet real-world join performance is nowhere near 4 times as costly. In addition, the next generation of 64-bit processors will penalize 32-bit joins, and further cut the cost of GUIDs. The next step after that will all but require that PKs are done with 128-bit GUIDs. Even with 128-bit processors, there will still be extra storage and I/O overhead with 128-bit GUIDs, but compared to the size of the overall record including Unicode data, a measly 96 bits is truly "chump change".

* Remember, the relevant comparison here is not between Identity columns and GUIDs, but between UNKs and GUIDs. I challenge you to produce smaller UNKs than 128 bits, without resorting to hashing (in which case it would no longer be an UNK).

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:44:02  Show Profile  Visit quazibubble's Homepage  Reply with Quote
I've clearly proven the theory and demonstrated the practical applications demonstrating the absurdity of the "natural key", aka the UNK. Furthermore, I've demonstrated beyond a shadow of a doubt the superiority of the GUID for use in transactional tables.

If you are still somehow of the irrational, dogmatic conviction that UNKs are the way, you either haven't read carefully enough, have your ears covered while singing "la la la la la la", or simply beyond hope--destined to continue propagating broken systems until your final, merciful death.


So is that the end of the story?

Not yet!

I've demonstrated that GUIDs are the preferred choice for transactional entities. But that leaves the two other major entity categories unaddressed:

* Static, read-only, denormalized OLAP-style tables, typical of a data warehouse. This is simple. The preferred PK? No PK at all. We have already discarded the "rules of normalization" which axiomatically require a single PK per entity, because the rules are entirely different.

* Look-up reference-type entities that are read often, but rarely updated. What to do in this case? Read on.


The answer of to the lookup table PK has been in front of us since the dawn of the computer era, and indeed since the realization of the most rudimentary mathematics:

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:46:13  Show Profile  Visit quazibubble's Homepage  Reply with Quote
ENUMERATION

You may say, "you're trying to sell us on using Identity columns after all!". No, I'm not. Forget Identity columns finally, would you? Never use them. Ever.

One charming fellow responded to my first post, by stating that if we are to not use UNKs as PKs, programmers would be somehow force to "hard-code magic numbers" into the system. (However, he failed to elaborate, nor did he demonstrate how he proposed to resolve his fallacious conundrum. I think he was just high.)

However, ever the vigilant master of theory education and practical demonstration, I will show you how to take advantage enumeration as well, to build more maintainable systems.


Scenario:

1) You have a simple lookup table, ORDER_STATUS.

2) You have a transactional table called CUSTOMER. One of it's columns is a foreign key into ORDER_STATUS.

3) You have built a workflow service which applies complex, business rule-driven logic to the CUSTOMER table, and one of the criterion it considers is the customer's order status.

The challenge presented is that your workflow service must be aware of the intrinsic meaning of all of the statuses in ORDER_STATUS. Do you use a GUID as the foreign key into ORDER_STATUS, and hard-code it's value in your code? No, that would fall under the category of "sh*tty-a** code" (unless those particular GUID values were for some reason accepted industry-wide as a best-practice for representing particular statuses...not likely).

So do you store the actual order status text as the UNK (both PK and FK), as goofball told us is the best way? Of course not, not only would it break the rules of normalization and create the same headaches I listed before, it's just plain stupid. You'd surely be fired, laughed at, or both. So forget that.

So what to do? We were almost there (and I know you probably already are), but let's approach it from the other direction to close the loop. What do programmers do to represent "magic numbers" to other programmers? They enumerate! All modern programming languages a chock full of standard enumerations. (Before there were enumerations, they were called constants--different implementation, but having the same objective and programmer benefit.)

Somehow, your code and the table must speak the same language. Ultimately, the same values must exist in both places, imbued with the same meaning. As you might guess, the simpler and more abstract that representation, the easier it is to maintain both sides. Strings (UNKs or GUIDs) are not good. Identity columns are as evil in this context as they are with transactional entities, but are still a definite no-no--the programmer is thrown to the mercy of the database engine. (While it is easy to code values based on what the system generates for the database, the reverse is an exercise in madness.)

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:47:04  Show Profile  Visit quazibubble's Homepage  Reply with Quote
The solution? One more time, repeat after me: Enumeration. Again: Enumeration. But you've probably figured that out by now.

Let's step back for a moment and put the theory to a practical demonstration. To illustrate, here's our order status and customer table designs (don't worry about debates over indexing, constraining, and clustering--the point is to make it easy to read while getting the point across):

TABLE ORDER_STATUS

ORDER_STATUS_ENUM int NOT NULL
PRIMARY KEY CLUSTERED,

VALUE varchar(50) NOT NULL
UNIQUE NONCLUSTERED

This approach is quite workable. But we are going to follow the savvy lead set by many shrink-wrapped enterprise systems, by centralizing all enumerated lookups into an ENUMS table, AKA a "list of values" repository:

TABLE ENUMS

ENUM_TYPE varchar(50) NOT NULL,

ENUM int NOT NULL,

VALUE varchar(50) NOT NULL

CLUSTERED INDEX ON ENUMS (ENUM_TYPE)
UNIQUE NONCLUSTERED INDEX ON ENUMS (ENUM_TYPE, ENUM)
UNIQUE NONCLUSTERED INDEX ON ENUMS (ENUM_TYPE, VALUE)

And finally, the customer table with the enum tables PK as a foreign key:

TABLE CUSTOMER

CUSTOMER_GUID uniqueidentifier NOT NULL
PRIMARY KEY NONCLUSTERED
ROWGUIDCOL
DEFAULT NEWID(),

NAME_FIRST varchar(50) NOT NULL,

ORDER_STATUS_ENUM int

It is now up to the application administrator, the programmer, or both to define the values that will be used to represent each order status. For example:

ENUMS view:
"OrderStatus", 1, "Open"
"OrderStatus", 2, "In process"
"OrderStatus", 3, "Complete"

(Whether you choose to implement a centralized ENUMS table as I have done is up to you. It could have just been ORDER_STATUS without the ENUM_TYPE column.)

VB class module:
Public Enumeration OrderStatus
Open = 1
InProcess = 2
Complete = 3
End Enum

That's it. Simple huh. No crazy UNKs here. Just your basic enums.

(continued...)

Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/16/2002 :  23:48:33  Show Profile  Visit quazibubble's Homepage  Reply with Quote
Conclusion

After discussion and comparison of theory, and proof through demonstrative, real-world application (not to mention lots'o experience--it's OK to worship me now in spite of what I said earlier), it should now be CRYSTAL clear that:

* You should never, ever even think about using an UNK. Wipe "natural key" from your memory. The downsides are severely bad, and the upshots are severely bad. There is nothing to win, except the twisted self-satisfaction of being smugly different than your peers. I'm sure they will wish you well when you are looking for a new job.

* Transactional entities = GUID as the PK. No need to think, I've already done that for you. Just do. (Don't forget to define your additional, multi-column unique constraints independent of the PK, depending on your business requirements.)

* Lookup data = enumerated integer as a PK. (What have I told you about trying to use that pathetic excuse for a brain? Just read my proof again and do it.)

* OLAP data = no key.


Got it?

Yours truly,
BT

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 12/17/2002 :  03:46:58  Show Profile  Visit nr's Homepage  Reply with Quote
ID, Name, DOB, Boss ID, Boss DOB
"321-45-9874", "Bob Smith", "5/3/1962", "123-54-4789", "1/1/1934"
"322-45-9874", "Julie Smith", "1/1/1999", "123-54-4789", "1/1/1934"
"123-54-4789", "Boss D. Man", "1/1/1934", "000-00-0001", "1/1/0001"

After this haven't bothered reading the rest of the posts as it has nothing to do with natural keys - given this I doubt if the rest is worth reading - but maybe it was just a mistake.
Using existing data as a primary key rather than using an identity doesn't mean that you should not normalise the structure.
Part of normalisation is identifying unique elements that form the (natural) key.

Can see how quazibubble comes by his views if he thinks this is what's meant by a natural key though.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Edited by - nr on 12/17/2002 03:48:30
Go to Top of Page

rihardh
Constraint Violating Yak Guru

Slovenia
307 Posts

Posted - 12/17/2002 :  04:03:08  Show Profile  Reply with Quote
Despite the fact that many would have you stoned to death for your rudeness and intolerance, I find the article quite good and logical.

I too was once stamped as cynical and without understanding to others who just started with MSSQL and posted every silly beginners question without ever considering opening a book.

But with time I found out that, the more I try to help (explaining best practices), the more I was cornering myself, giving away valuable information to people I don't even know and who might someday even, god beware, be my competition in the ER market.

I settled to giving away just as much information as needed, leaving the global problem open.

If the article of NK or "UNK" bothers you it's OK to comment that, but firing away like you did will change nothing. Believe me, I know (and many others do) the feeling when you open a 800 page highly praised book and find out that it's all crap and the author should consider farming instead. Or if you are reading posts of self proclaimed DBA's who's understanding of RDBMS ends at lunchtime.

I believe that your posts are a result of piled up rage which is hopefully gone now. I believe too that if you stick around you'll find that many members share your view (in a tempered way) and are willing to listen to every reasonable suggestion.

P.S.
next time shorter please

Go to Top of Page

jsmith8858
Dr. Cross Join

USA
7423 Posts

Posted - 12/17/2002 :  08:20:20  Show Profile  Visit jsmith8858's Homepage  Reply with Quote
So...

if I decide to use SS# as a primary key for an employees table...

and I wish to store the boss's SS# for each employee in the table...

then I MUST store the birthday of the boss in each employee record?

hmm....the things you learn!

So THAT'S why SS# is a bad key!


quote:

* You should never, ever even think about using an UNK. Wipe "natural key" from your memory. The downsides are severely bad, and the upshots are severely bad. There is nothing to win, except the twisted self-satisfaction of being smugly different than your peers. I'm sure they will wish you well when you are looking for a new job.

* Transactional entities = GUID as the PK. No need to think, I've already done that for you. Just do. (Don't forget to define your additional, multi-column unique constraints independent of the PK, depending on your business requirements.)

* Lookup data = enumerated integer as a PK. (What have I told you about trying to use that pathetic excuse for a brain? Just read my proof again and do it.)

* OLAP data = no key.



Sounds to me like you have a "complex process" to determine how to decide what your key is. You know, those things will box you into a corner, or so I've heard. And yours is *slightly* inflexible....

(I'm sorry to hear my little list of guidelines takes you several hours to go through.)

- Jeff
Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/17/2002 :  10:40:27  Show Profile  Visit quazibubble's Homepage  Reply with Quote
RE: The SSN key thing ([nr] and [jsmith8858]). I know it's a tough concept, but don't be distracted by the fact that I was trying to kill two little birds with one stone in my example:

1) Natural keys (even if a single column) violate normal forms 1, 2, and 3. For this point, you may visualize the example without the Boss DOB column.

2) Multi-column keys are bad. I hear alot of discussion around doing just that when discussing UNKs. E.g., the two-column foriegn key for boss.


[nr] sounds like just the kind of person who is afraid to have their convictions challenged. ("...haven't bothered reading the rest of the posts..." after stumbling on the first thing he didn't agree with, because he doesn't want to be challenged otherwise with follow-on analysis.)


[rihardh]: Thank you. It's not so much that I'm offended by bad design as a philisophical affront. (And contrary to what it might seem here, I'm really not an angry guy.) It's just that I've had to fix, or pay to fix, SO many broken systems, that were badly broken for the very reasons I've explained. And quite unnecessarily. It's amatuer architects who learn Access by accident in their job as a marketing analyst (who never intended to be a developer), read these silly posts on natural keys, and decide they know enough to switch to IT. That cause ME heartburn, headache, and money. And yes, I do let it get to me. I wish it wouldn't. That's why I'm so "familiar" with the consequences of bad design, because in one way or another, they have cost ME! These are not esoteric "think-tank" examples, I've had to fix each and every one of them, multiple times. I admit I am intolerant--intolerant to coffee-shop wanna-be architects who don't even understand the first three basic forms of normalization (other than how to say "I've got a normalized design" in meetings), but have decided they don't need to be "boxed in" by silly acedemic rules. They want to "innovate", and in the process box me into a stinking, hulking, unfixable piece of sh*t. But even though I feel better getting this off my chest, I do realize it will not solve anything, and broken designs will continue to be propagated.

That's one of the reasons I thank god for .NET! Visual Basic now actually has a "learning curve". People who can write a macro in Excel can no longer just wake up one day and deciding to get a job as a senior programmer. One can only hope that the next, .NET-ized version of SQL Server will be similarly more difficult to switch over from Access as well. Or that they make Access more difficult, but that will never happen.

jsmith8858 and the "complex process": if you think it's a "complex process" to understand whether you are dealing with an analytical, transactional, or static lookup table, I highly recommend you take some classes on databases (starting with "Introduction to MS-Access"). If you actually approach all three the same way, somebody is going to get fired after you deliver broken crap, that "somebody" meaning YOU. If you think the rules of normalization create inflexible designs, then hold that thought, and come back to me after you architect an enterprise system in scale and scope, without regard to them. Then tell me how "flexible and free" you felt! Gets kind of complex, huh? Better yet, try making some mundane change to the underlying ER structure to such a system which was designed without regard to the first 3 forms of normalization. Tell me how smooth it went. You know what the Trix bunny says...

BT

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 12/17/2002 :  10:49:11  Show Profile  Visit nr's Homepage  Reply with Quote
quote:

[nr] sounds like just the kind of person who is afraid to have their convictions challenged. ("...haven't bothered reading the rest of the posts..." after stumbling on the first thing he didn't agree with, because he doesn't want to be challenged otherwise with follow-on analysis.)



Nope - I didn't read on for the reason I gave.
That I didn't think it would be worthwhile after a basic misunderstanding of the concepts.

I find it difficult to believe that using existing data to form a primary key will always violate 1st normal form - but in that case I'm happy to violate that definition.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

quazibubble
Starting Member

USA
25 Posts

Posted - 12/17/2002 :  11:01:33  Show Profile  Visit quazibubble's Homepage  Reply with Quote
[nr] says: "I find it difficult to believe that using existing data to form a primary key will always violate 1st normal form - but in that case I'm happy to violate that definition"

Apparently you will find many things "difficult" to believe. But that is no reason to continue through life in blissful ignorance.

Your comment clearly demonstrates why you--ESPECIALLY you--should read the posts. (Or if you don't feel like reading them, then sit down and shut up.) And yes, you have simply stopped at the first sign of mental discomfort. Read on--you apparently will find quite a bit more discomfort, but your effort just may save me a massive migraine if I ever have to fix the broken crap you helped build. Do the world a favor. Get educated. In spite of the fact that most people are offended by my confrontational posting style, I assure you that many who've actually read the posts, are snickering at you at this very moment--not because of me, but because of your own display of overwhelming non-understanding.

BT

Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 12/17/2002 :  11:05:36  Show Profile  Visit robvolk's Homepage  Reply with Quote
quazibubble-

If you want to participate on SQL Team in a constructive, instructive manner, I'm all for it and appreciate your efforts in that regard. However, calling people "morons" and otherwise insulting their intelligence, simply because they don't agree with you, is not the way to do it.

If you cannot contribute in any other manner then SQL Team is not the site for you, nor are you the kind of contributor we need or want.

Go to Top of Page
Page: of 8 Previous Topic Topic Next Topic  
Previous Page | Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.36 seconds. Powered By: Snitz Forums 2000