Author |
Topic |
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-06 : 22:11:29
|
I m designing a table which will hold about 40 million records.The data which is going to be inserted into the table is a alphanumeric unique code eg: (CABBXFGRET). Similarly there will be 40M codes, all are indexed starting with character 'C', with the second alphabet used as an index for about 5 million codes each.Is it advisable to have 1 table and load all the records in it or is it more sensible to split the codes in to various tables based on index.Eg: All codes starting with 'CA.....' till 'CC.....' in one table. etc.So that the search logic can be designed based on the index of the code.Logically considering this is helping us reduce the latency in search due to the heavy volume. But i m concerned that this doesnt follow the rules of normalization.Any suggestions from the DB gurus out there is very much appeciated.Thankssunil. |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2007-05-06 : 23:49:28
|
once the data is loaded, do you expect to do more updates and inserts, or is the table only read from? www.elsasoft.org |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 00:05:21
|
No more insertions, but there will be updations. |
 |
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2007-05-07 : 00:23:38
|
Is that the only unique column in the DB or would there be additional columns that would be in the queried against?Dinakar NethiSQL Server MVP************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 00:26:12
|
quote: Originally posted by dinakar Is that the only unique column in the DB or would there be additional columns that would be in the queried against?Dinakar NethiSQL Server MVP************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/
Yes its the only unique column in the table. |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 00:30:06
|
quote: Originally posted by jezemine once the data is loaded, do you expect to do more updates and inserts, or is the table only read from? www.elsasoft.org
No more insertions, but there will be updations. |
 |
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2007-05-07 : 00:36:28
|
I would recommend creating an IDENTITY column and using the combination of IDENTITY column and your code column as primary key with a clustered index. Please test thouroughly before deciding on a solution.Dinakar NethiSQL Server MVP************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 01:38:08
|
quote: Originally posted by dinakar I would recommend creating an IDENTITY column and using the combination of IDENTITY column and your code column as primary key with a clustered index. Please test thouroughly before deciding on a solution.Dinakar NethiSQL Server MVP************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/
I have an identity column in place, but cant have a composite primary key because it defeats the purpose. This would potentially allow duplicate codes in the column.Logically the soln. of splitting the records in to multiple tables is improving my search as the no. of records i have to span will be less in a table (approx. 5 Million) compared to 40 M in a single table.But my question is, that if its a good practice to split the records in multiple tables, does this follow the norms of an RDBMS?And if there is any solution to such problems.Simply given the amount of transactions in a bank, will be far more than my 40 M records, so how do they manage. Actually this was the question which made me rethink on my design.Anyways thanks.sunil. |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-05-07 : 02:23:29
|
quote: Logically the soln. of splitting the records in to multiple tables is improving my search as the no. of records i have to span will be less in a table (approx. 5 Million) compared to 40 M in a single table.
Only if a SCAN takes place. An INDEX SEEK will not matter (mostly)quote: But my question is, that if its a good practice to split the records in multiple tables, does this follow the norms of an RDBMS?
This has nothing to do with the normal forms. You have the logically model (which is normalised), how it is "stored" is irrelevant.quote: And if there is any solution to such problems.
Partitioned ViewsCan you show us an example of both a UPDATE query (e.g. update a "CA" row to a "CZ")Can you show us an example of a SELECT query (e.g where ID like 'CA%)I am inclined to suggest a single table... 40M is not that much of stretch for SQL Server DavidMProduction is just another testing cycle |
 |
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2007-05-07 : 12:13:11
|
quote: Originally posted by m.sunilsI have an identity column in place, but cant have a composite primary key because it defeats the purpose. This would potentially allow duplicate codes in the column.
That is exactly why a composite key with identity value + your code will make it unique. identity value will never be repeated. so there is no issue of duplicates. this makes sense even more if you have duplicates because SQL Server has to do a scan or a lookup if there are too many duplicates. for example lets say you have a code CA123. and there are 5000 rows with same code. If you dont have another unique value all these values will be under the same leaf level. but if you had a distinct value along with the code, the combination will make it unique and the values will be spread out improving the performance.Dinakar NethiSQL Server MVP************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 17:51:07
|
Hi david,To elaborate more on the scenario. I have divided the records into 5 million each ie; tblCodesC_A_C will contain 5M records which will have all unique codes which start with CAXXXXXX ...., CBXXXX ...., CCXXXXXX ...Similarly tblCodesC_D_F and so on..The primary requirement is that codes should all be unique which we preload into the table. The structure of the table is like this.[sno, code, product_id, status, updatedDate]Where sno is a identity column, code is the preloaded unique code column, product_id specifies the product to which the code belongs and status specifies if the code is used or is available to be used, updatedDate is the date when the code gets used up. The tables should allow extra codes to be added to the category any time in future.Because of the unique code requirement i have made the "code" column a primary key.Coming to your questionsquote: Can you show us an example of both a UPDATE query (e.g. update a "CA" row to a "CZ")Can you show us an example of a SELECT query (e.g where ID like 'CA%)
First i check if the code has not been already used and if it exists based on the user input. Say eg: CAXXXXXX is the user input.By checking the first two alphabets i know that it belongs to "tblCodesC_A_C".my select query will be like this.set @status = 2"SELECT @status=status from tblCodesC_A_C with (nolock) where code = @code"Where @code is the user input of the unique code and status specifies if the code is already used or not. Status = 0 not used, 1 - used, 2 invalid code.So if the code is valid and is not used i subsequently updated the code as follows."UPDATE tblCodesC_A_C with(rowlock) set status=1 where code=@code"Here i m exclusively specifying locks as i observed that sometimes SQL goes in for lock escalation in cases of delay which is sometimes leading to deadlocks and DB becoming completely irresponsive.Plz give me some more idea regarding partioned views and how it can be done, any web references if possible.Thankssunil. |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-05-07 : 18:24:46
|
Sunil,Look in the SQL Server help file (BOL) for Partitioned Views. I could quote you something about giving a man fish versus teaching fishing... But I won't! :-)If you are not updating the key (code column in your example) and you have no new inserts then I think you are over engineering the problem. Base on what you gave, I still reckon the 1 table approach will be fine.DavidMProduction is just another testing cycle |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 18:53:56
|
quote: Originally posted by byrmol Sunil,Look in the SQL Server help file (BOL) for Partitioned Views. I could quote you something about giving a man fish versus teaching fishing... But I won't! :-)If you are not updating the key (code column in your example) and you have no new inserts then I think you are over engineering the problem. Base on what you gave, I still reckon the 1 table approach will be fine.DavidMProduction is just another testing cycle
Well thanks for that piece of advise. Will try it out using a single table.Also i will try n learn fishing as u rightly advised, may help me in future. :-)But one last doubt, wont there be a difference trying to search a code in 40M records to searching in 5M records?Well i conclude this discussion with the answer i get for the above question.Thankssunil.Thanks,Sunil. |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-05-07 : 19:30:54
|
>>>But one last doubt, wont there be a difference trying to search a code in 40M records to searching in 5M records?There is only one way for you to find out....DavidMProduction is just another testing cycle |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 19:35:04
|
quote: Originally posted by byrmol >>>But one last doubt, wont there be a difference trying to search a code in 40M records to searching in 5M records?There is only one way for you to find out....DavidMProduction is just another testing cycle
haaaaaaaaaa....Yeah i did tested it out practically and found that there was not much difference in the execution plan when i queried on 40 M records and 5M records.Thanks david for your right suggestion.Well thanks a lot for all the advises and contributions to everyone involved in helpming me out.Thanks,Sunil. |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-05-07 : 21:19:02
|
>>Yeah i did tested it out practically and found that there was not much difference in the execution planSunil, that is not a test. NEVER rely on the execution plan as a measure of performance.Fire up SQL Profilier, open several connections (at least 10) and smack the db from all connections concurrently using various parameters and TSQL statements that match several use cases..Rinse and Repeat for 50 and 100 connections... upto 200% of expected load.DavidMProduction is just another testing cycle |
 |
|
m.sunils
Starting Member
10 Posts |
Posted - 2007-05-07 : 21:57:37
|
quote: Originally posted by byrmol >>Yeah i did tested it out practically and found that there was not much difference in the execution planSunil, that is not a test. NEVER rely on the execution plan as a measure of performance.Fire up SQL Profilier, open several connections (at least 10) and smack the db from all connections concurrently using various parameters and TSQL statements that match several use cases..Rinse and Repeat for 50 and 100 connections... upto 200% of expected load.DavidMProduction is just another testing cycle
wow, i didnt quiet realize that. Yep will try it out.Thank you once again david.Thanks,Sunil. |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-05-07 : 22:09:11
|
>>Thank you once again david.No problems.In reality, a test can only be valid when in simulates the "real" situation.It is a very common mistake to test against a small data set or low concurrency environment.DavidMProduction is just another testing cycle |
 |
|
|