Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-03-31 : 03:04:51
|
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47738So along comes PW and gets my Topic locked because he arrogantly jumps to the wrong conclusion!!! Thats the first bad experience on this site and hopefully the last!THIS IS A COMPLETELY DIFFERENT ISSUEI am not talking about comma seperated values here!Perhaps Rockmoose will be allowed to continue with his CONSTRUCTIVE attempt:Table1------OldID NewGUID1 {8E2F9FE5-F382-49B5-B439-81A0258E5251}2 .....Table2------OldID NewGUID1 {8E2F9FE5-F382-49B5-B439-81A0258E5252}2 .....Table3------OldID NewGUID1 {8E2F9FE5-F382-49B5-B439-81A0258E5253}2 .....Table4------OldID NewGUID1 {8E2F9FE5-F382-49B5-B439-81A0258E5254}2 .....Table5------OldID NewGUID1 {8E2F9FE5-F382-49B5-B439-81A0258E5255}2 .....OldHistoryTable---------------ID TheDate Tbl1ID Tbl2ID Tbl3ID Tbl4ID Tbl5ID1 13/5/89 1 3 2 1 202 14/5/89 4 5 6 8 23 ...NewHistoryTable---------------ID TheDate Tbl1NewGUID Tbl2NewGUID Tbl3NewGUID Tbl4NewGUID Tbl5NewGUIDDoes that help? |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2005-03-31 : 07:47:02
|
| Bob, relax. I locked the topic so that people follow the original thread and not the new one, we do it all the time. Unless you are substantially changing the topic, please post follow-ups in the original thread, don't start another. It's too easy for readers to get lost. Ignore PW. |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-03-31 : 08:43:37
|
| I still say this is a seperate issue and merits a discussion in its own right. |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-03-31 : 09:41:41
|
| What are you trying to here? What is the NewHistory table storing? I mentioned before about normalization, and you indicated you were a big believer/follower of it, so what are you trying to do here? All you need to do is SELECT from your old history table, and do a join to each of your OldID/NEwGuidID tables on each column, and insert the results into your new table:INSERT INTO NewHistoryTable (ID, TheDate, Tbl1NewGUID, Tbl2NewGUID, Tbl3NewGUID, Tbl4NewGUID, TblNewGUID)SELECT Old.ID, Old.TheDate, T1.NewGUID, T2.NewGUID, T3.NewGUID, T4.NewGUID, T5.NewGUIDFROM OldHistoryTable OldLEFT OUTER JOIN Table1 T1 ON Old.Tbl1ID = T1.OldIDLEFT OUTER JOIN Table2 T2 ON Old.Tbl2ID = T2.OldIDLEFT OUTER JOIN Table3 T3 ON Old.Tbl3ID = T3.OldIDLEFT OUTER JOIN Table4 T4 ON Old.Tbl4ID = T4.OldIDLEFT OUTER JOIN Table5 T5 ON Old.Tbl5ID = T5.OldIDBut, again, I really question your table structure and what you are tyring to accomplish here ...- Jeff |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-03-31 : 11:08:19
|
| Thanks for answering Jeff.Tell me if this is not 3rd normal form:I have a history table that stores records for things that took place.Those 5 ID's represent 5 different objects that were involved in that event eg if I dream a few entities/tables up for you: Companies, People, Contracts, Locations,...What other way is there to do this? Whats wrong with it? |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-03-31 : 11:36:43
|
| can an event ever occur to more than 1 of those entities? I.e., can an event happen in more than 1 location or with more than 1 person?Typically, you'd have the master table of Events with only the data that is completely dependant on the event and is 1-1. (i.e., datetime like you mentioned, or description, or category, or whatever is fully dependant on that particular event and nothing else). Then you'd have related tables like EventPlaces or EventCompanies which store many-to-many relations between Events and the things you wish to link to the events. So, it an event can only take place in 1 location and for 1 company, put those values in your Events table. But if it can include many people or many departments, break those out into separate many-to-many tables, with primary keys of EventID/PersonID or EventID/DepartmentID for example.- Jeff |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-03-31 : 12:53:44
|
quote: Originally posted by coolerbob Thanks for answering Jeff.Tell me if this is not 3rd normal form:I have a history table that stores records for things that took place.Those 5 ID's represent 5 different objects that were involved in that event eg if I dream a few entities/tables up for you: Companies, People, Contracts, Locations,...What other way is there to do this? Whats wrong with it?
I would like to elaborate a little on the points that Jeff raised on the design.And your question about 3NF.You are basically storing an event involving 5 entities; (EVENT,A,B,C,D,E)You table would be in 3rd normal form, but it would probably contain "multi-valued" dependencies.I say probably, because I don't know exactly what you are doing with this table.Consider this:If the event can involve say 2 "A"sEVENT A B C D Ee1 a0 b c d ee1 a1 b c d e Then the information that b,c,d,e was involved with e1 is stored twice!The solution is to further normalize the database (4NF)Like Jeff suggested:EVENT Ae1 a0e1 a1EVENT Be1 bEVENT Ce1 cEVENT De1 dEVENT Ee1 e If You join those 5 tables on e1, You will in fact get this resultset:EVENT A B C D Ee1 a0 b c d ee1 a1 b c d e Now a note on "missing" information:-------------------------------------------------------------------------------------Furthermore, in the (EVENT,A,B,C,D,E) table it is not possible to store the information thatfor example b,c,d,e were involved with EVENT e2, but no A was present:(unless we introduce NULLS)EVENT A B C D Ee2 NULL b c d e But now we could theoretically get this in the database (due to NULLS, and what would the key be?):EVENT A B C D Ee3 NULL b c NULL ee3 a NULL c d NULL Which will make it really ugly to get the whole picture for e3.rockmoose |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-04-01 : 02:36:15
|
| OK, but isn't there quite a big trade-off in performance with all those extra tables? |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-04-01 : 04:09:59
|
| I'm getting battered continueously on how things like 3rd and 4th normal impact performance. The line here is that "JOINS cost a lot. Sometimes things can't be pretty and you have to duplicate data or commit some sin to spare yourself a performance costly JOIN." I'm getting told to go and read the book on the core SQL engine to find out how it uses buffers and everything else before I dare suggest that the peformance hit of JOINs generally are neligable and worthwhile.That's another reason why the table is the way it is at the mo. Any ammo for me anyone? Or are they right? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2005-04-01 : 07:45:35
|
LOL, I love the irony of how THEY are telling YOU to go read a book on how SQL Server works. As if anyone there ever read Books Online (everything you need to know about buffers is in there with about 5 minutes' research) Perhaps you should point them to SQL Team. If they're so sure of their knowledge they'd thoroughly enjoy discussing it with us. Your coworkers are extraordinarily misinformed. I still don't understand why all of this crap is going on. If there's something wrong with this app that you are trying to fix, it's pretty obvious that their earlier approach doesn't work, or else why would you fix it? Seriously, what is the purpose of your involvement? So far all of the arguments seem to be political rather than technical, basically they don't want to admit that they might have made a mistake in developing it.I'd suggest avoiding all of the arguments. Do your research on your own, don't make suggestions for changes to anyone, don't ask their advice on anything except simple answers to simple questions. If you can test new techniques without other's involvement, do so. No one at your company seems to want to help you, so avoid them. If you can demonstrate clearly that your method is better, that's the best you can do. They'll either accept it or not. |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-04-01 : 09:16:32
|
>> The line here is that "JOINS cost a lotNo. SQL Server handles JOINS very well, with proper indexing, there is not much of a performance hit.>> Sometimes things can't be pretty and you have to duplicate data or commit some sin to sapre yourself a performance costly JOIN.You might gain som performance for some SELECT operations that can draw benefit from the denormalisation. Granted.For practically every other operation You will lose performance.Here is a testbed mimicking your situation as I understood it.I create a "broad" Event table,And a 4NF normalized schema with the same information.The test data is 100000 events:On my machine, the results were the following;Retrieving all data for 10 random Events;For 100000 events the normalized schema is faster(~3x)For 200000 events the broad table is faster(~3x)You can try the example if You want, just copy paste the code(the cleanup is commented away, so run that after You are done)-- "denormalized" event tableCREATE TABLE Event3NF( EventID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT(NEWID()) ,EventDate DATETIME DEFAULT GETDATE() ,AID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,BID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,CID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,DID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,EID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,FID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,GID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()) ,HID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()))-- "fully normalized" event tablesCREATE TABLE Event4NF( EventID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT(NEWID()) ,EventDate DATETIME NOT NULL DEFAULT GETDATE())CREATE TABLE EventA4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),AID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,AID) )CREATE TABLE EventB4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),BID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,BID) )CREATE TABLE EventC4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),CID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,CID) )CREATE TABLE EventD4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),DID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,DID) )CREATE TABLE EventE4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),EID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,EID) )CREATE TABLE EventF4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),FID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,FID) )CREATE TABLE EventG4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),GID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,GID) )CREATE TABLE EventH4NF( EventID UNIQUEIDENTIFIER REFERENCES Event4NF(EventID),HID UNIQUEIDENTIFIER DEFAULT(NEWID()), PRIMARY KEY(EventID,HID) )GO-- insert test data-- create 200000 random events...PRINT 'Generating 100000 random events, (same info in "broad" table and in fully normalized schema) ...'; PRINT ''INSERT Event3NF(EventID) SELECT TOP 100000 NEWID() FROM master..spt_values t1 CROSS JOIN master..spt_values t2SET NOCOUNT ONGOINSERT Event4NF(EventID) SELECT EventID FROM Event3NFINSERT EventA4NF(EventID,AID) SELECT EventID,AID FROM Event3NFINSERT EventB4NF(EventID,BID) SELECT EventID,AID FROM Event3NFINSERT EventC4NF(EventID,CID) SELECT EventID,AID FROM Event3NFINSERT EventD4NF(EventID,DID) SELECT EventID,AID FROM Event3NFINSERT EventE4NF(EventID,EID) SELECT EventID,AID FROM Event3NFINSERT EventF4NF(EventID,FID) SELECT EventID,AID FROM Event3NFINSERT EventG4NF(EventID,GID) SELECT EventID,AID FROM Event3NFINSERT EventH4NF(EventID,HID) SELECT EventID,AID FROM Event3NFGO-- create view over the "fully normalized" schema, to mimic the "denormalized" schema-- this is just to make retrieval easy to presentCREATE VIEW vEvent4NF WITH SCHEMABINDING ASSELECT nf4.EventID ,nf4.EventDate ,anf4.AID ,bnf4.BID ,cnf4.CID ,dnf4.DID ,enf4.EID ,fnf4.FID ,gnf4.GID ,hnf4.HIDFROM dbo.Event4NF nf4 JOIN dbo.EventA4NF anf4 ON nf4.EventID = anf4.EventID JOIN dbo.EventB4NF bnf4 ON nf4.EventID = bnf4.EventID JOIN dbo.EventC4NF cnf4 ON nf4.EventID = cnf4.EventID JOIN dbo.EventD4NF dnf4 ON nf4.EventID = dnf4.EventID JOIN dbo.EventE4NF enf4 ON nf4.EventID = enf4.EventID JOIN dbo.EventF4NF fnf4 ON nf4.EventID = fnf4.EventID JOIN dbo.EventG4NF gnf4 ON nf4.EventID = gnf4.EventID JOIN dbo.Eventh4NF hnf4 ON nf4.EventID = hnf4.EventIDGO-- create small search table to retrieve data for 10 randomly chosen EventID's-- search these 10 random events from the 3NF, 4Nf tables;CREATE TABLE #searchevents(EventID UNIQUEIDENTIFIER)INSERT #searchevents(EventID) SELECT TOP 10 EventId FROM Event3NF ORDER BY NEWID()GO-- Start performance testing-- Over the "denormalized" table and the "fully normalized" tables--CREATE UNIQUE CLUSTERED INDEX ixXXX ON vEvent4NF(EventID,EventDate,AID,BID,CID,DID,EID,FID,GID,HID)DECLARE @d DATETIMEDBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; PRINT '' -- Clear cached dataPRINT 'Counting total no of events from "broad" table...'SET @d = GETDATE()SELECT COUNT(*) AS CntEvents3Nf FROM Event3NFSELECT DATEDIFF(ms,@d,GETDATE()) AS elapsed_time3nf_Count/*DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE -- Clear cached dataSET @d = GETDATE()SELECT COUNT(*) AS CntEvents4Nf FROM vEvent4NFSELECT DATEDIFF(ms,@d,GETDATE()) AS elapsed_time4nf_Count*/DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; PRINT '' -- Clear cached dataPRINT 'Counting total no of events from "4nf" table...'SET @d = GETDATE()SELECT COUNT(*) AS CntEvents4Nfb FROM Event4NFSELECT DATEDIFF(ms,@d,GETDATE()) AS elapsed_time4nf_CountDBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; PRINT '' -- Clear cached dataPRINT 'Retrieving 10 random events from "broad" table...'SET @d = GETDATE()SELECT * FROM Event3NF WHERE EventID IN(SELECT EventId FROM #searchevents) ORDER BY 1,2SELECT DATEDIFF(ms,@d,GETDATE()) AS elapsed_time3nf_Select10DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; PRINT '' -- Clear cached dataPRINT 'Retrieving 10 random events from 9-table join of 4nf normalized tables table...'SET @d = GETDATE()SELECT * FROM vEvent4NF WHERE EventID IN(SELECT EventId FROM #searchevents) ORDER BY 1,2SELECT DATEDIFF(ms,@d,GETDATE()) AS elapsed_time4nf_Select10GO/*-- Clean upDROP TABLE #searcheventsDROP VIEW vEvent4NFDROP TABLE EventA4NFDROP TABLE EventB4NFDROP TABLE EventC4NFDROP TABLE EventD4NFDROP TABLE EventE4NFDROP TABLE EventF4NFDROP TABLE EventG4NFDROP TABLE EventH4NFDROP TABLE Event3NFDROP TABLE Event4NF*/GO rockmoose |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-04-01 : 09:48:54
|
| As rockmoose states, remember that SQL Server and other relational databases are truly optimized for one primary goal: to efficiently process joins. That's the whole point of a RDMS, the reason it exists!- Jeff |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-04-01 : 10:20:20
|
| Some more rantings...You should create a fully normalized logical model of your business data.And not initially worry about any (technical) performance issues.Once You have the model, then You are better prepared to design the physical implementation of it.IMC a fully normalized phyical implementation is best.If there arises a situation where it makes sense to precalculate and store informationfor faster retrieval, then that will be a physical implementation issue, and it can be solved in several ways.If You start by implementing a denormalized structure, the db will be much less flexible.rockmoose |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-04-01 : 10:30:10
|
| I know I am badly paraphrases a common quote or saying, but it goes something like this: premature optimization is just about the worst thing a developer or database modeller can do.- Jeff |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-04-01 : 11:30:17
|
quote: Originally posted by jsmith8858 I know I am badly paraphrases a common quote or saying, but it goes something like this: premature optimization is just about the worst thing a developer or database modeller can do.- Jeff
That was an optimization of my rant I dare say.I read that quote somewhere...[url]http://c2.com/cgi/wiki?PrematureOptimization[/url]rockmoose |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-04-01 : 11:33:29
|
good link!- Jeff |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-04-05 : 09:16:47
|
| rockmoose,i ran that test. The 4nf way was about 6 times slower on my system - just like it was slower on yours. So that proves that breaking the rules will make it faster. But how big a deal is it that the 4nf way is slower? Will the users notice it? That's my main concern. The old database/system uses a denormalized db. I have one shot at the new history table and I have to make the right decision. I think it will be a nightmare to go and change it later. It will be too much work and will affect to many other parts of the project.How well does this 4nf approach scale? Used it on a big project with millions of history records and ten of these 4nf tables coming off the history table before? |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-04-05 : 10:59:18
|
| >>So that proves that breaking the rules will make it fasterFaster SELECTs, sometimes, for very simple basic "list" queries w/o any math or complicated sorting or criteria/joins. A good example is: finding max value in a set of rows versus sets of columns -- that is much, much faster if your data is normalized and indexed properly. But just blindly joining everything there is and returning all rows, then denormalized might be faster. But with joins, filters, sorts, aggregates, etc -- normalized will just about always be faster using proper indexes and querying techniques.As for UPDATES, INSERTS, DELTES, sorts, data maintainence, imports, exports, data verification, and virtual every other operation a database can perform: Normalized will always be faster, simplier, and more efficient overall.The main point is, and this is important: never compare the efficiency of algorithms or programs that return different results. The one that returns correct results is ALWAYS "faster". Be sure to read the examples about data integrity issues of not normalizing in this case, and things like what I mentioned about more than 1 person attached to an event and how you will store it. Long story short: If using a particular technique requires that you can't store the data correctly or in an efficient manner, then there's no point in measuring the efficiency of that techinique.- Jeff |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-04-05 : 12:15:32
|
| So in your opinion guys, which is a better design:1. A Central History Table StrategySo you would have:EventHistory Table with columns: EventGUID, EvDesc, EvNotes, EvDateEventTypes Table with columns: LinkGUID, EventGUID, TypeAnd then all the link tables like you suggested:EventCompaniesEventContractsEventSitesand so on... I could end up with 10 of these tablesOR2. A History Table for each EntitySo you would have:CompaniesCompanyHistoryContractsContractHistorySitesSiteHistoryEach of these history tables would have columns for the other entities that this specific table could have interacted with. |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-04-05 : 12:54:41
|
| It's hard to choose the best physical model for your database if we don't know the logical model. You need to be able to express, in regular English sentences, all of the relationships between the entities you are dealing with. I also don't understand you EventTypes table, with the LinkGUID and EventGUID columns ... if it is a table of types, in which you can classify each event, you wouldn't need a GUID or a link back to the Events table. It would work the other way around, Events link to an EventType.Again, answering the specific questions posed in this thread (can an Event have more than 1 person? What else is fully dependant on each event? etc) and more information is required. If you find as you go through this process you cannot fully state the relations between the various entities you are modelling very precisely from a logical standpoint, it is time to go back to the source and make sure you understand the requirements of the users and the data they need to store.- Jeff |
 |
|
|
coolerbob
Aged Yak Warrior
841 Posts |
Posted - 2005-04-06 : 03:46:35
|
| Wise words. I will do that. Thankyou very much. |
 |
|
|
Next Page
|
|
|
|
|