History table design - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

History table design

Author

Topic

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 04:48:17

I will be going live with an EventHistory table with 25 million records.
Growth will be 2 million records a year.
The table will mostly be used for recent data and just sporadic queries on old data.
Replication will be a future requirement.
It has this structure:
HisID (uniqueidentifier); HisTypeEnum (Tinyint); HisDate (datetime); HisDesc (varchar255); HisNotes (text)
I've tried wangling my way out of that last text column but the idea is selling well.

There is one link table to all the different tables that may be linked to an event:
LinkID (uniqueidentifier); HisID (uniqueidentifier); LinkObjID (uniqueidentifier); LinkObjTypeEnum (tinyint)

Generally the usage will be that as users select different things in the client application, they will emmediately want to see the most recent history. If they start scrolling it we may need to go and get the rest for them. Performance is key here.

Obviously we dont need to accomodate updates on this history table.

I assume it would be a good idea for us to have at least one view for each table that has it history in this history table. So we would have vTableAHistory, vTableBHistory...

I'm also going to try and put my foot down on history older than 10 years. We still can't delete it though. How hard is it to implement data warehousing for this? Would that be the way to go?

I have considered a partitioned view but I have a problem. Some tables that have their history stored in this history table have got some records that havent had any history for 2 years. Others have had lots. It's quite random. So the view will often not be able to benefit from just getting data from the most recent event history table.

Also, I dont quite get this. Can someone explain it to me:
"A partitioned view must be partitioned over a column that is part of the primary key,
and for queries to draw advantage of the partitions the selection criteria should include the partition column(s)."
My understanding is that you partition a table by rows. You dont split the columns into more than one table. So I don't quite get this quote.

I also need to find the quickest way to populate the history table with 50 million records to test the solution's ability to deal with future growth in it's lifetime. The way I tried yesterday was still running this morning and hadn't done jack. Something to do with BCP maybe? Dunno.

If you need any other details, please dont hesistate to ask.

I really need to avoid the anti patterns

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2005-04-08 : 06:54:15

"The way I tried yesterday was still running this morning and hadn't done jack"...did you put in regular COMMIT points???

INSERT LARGE RANDOM DATA.....
SETUP a SMALL TABLE with required input data, setup a 'SEQUENCE/TALLY' table....full of numbers 1-99999(etc)...search here for "TALLY" for descriptions (and samples+benefits) of same....and CROSS JOIN your INPUT data withthe TALLY table to generate a large DATASET.

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 07:40:29

Already doing TAlly, Wasnt commiting. will try that

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 09:19:28

I am however desperate to come up with the right physical implementation here.
Anyone got any bright ideas?

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 09:31:56

how will the front end be implemeted? asp.net? winforms?
internal LAN, over internet?

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 09:51:08

Winforms (500 concurrent internal users) on LAN
The usage description I gave earlier was for these internal users.

&

ASP.NET (1000 concurrent external users over internet)
These would rarely need to see back more than a year - if ever.

That user amount is double what we currently have. But it will need to work on that many as the company continues to grow

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 10:06:25

so basicaly your problem is how to display data to the user fast enough?

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 10:14:30

Correct

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 10:28:00

well as i understand you (i might be wrong

) you want to display data fast enough for users to scroll down to.
have you thought about paging? that would solve your problems, no?
if that's not an option, you can bulid an intermediate layer data retreives older data based on where the user has scroled to.
i.e.: if the user can scrools down to record 520 your layer asynchrounosly brings back records 2000-3000 and so on. so you have a kind of N next records buffer.

it's just an idea...

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 10:34:41

I've typed "paging" into BOL. Didnt come up with an explanation. Do you know where I can read about "paging"? I dont really know what it is.
I cant see how the "intermediate layer" would meet my criteria of bringing back the initial data super-fast.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 10:47:35

paging is used on a client side to view large quantities of data.
you've probably seen on the net: page N of M.
you use a sproc to access only the neccesery page which has a limited number of data.
search about paging here. it's a hot topic

about the layer. the data would be in memory already so the access would be fast.

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 11:10:10

I'll have rummage for paging

"about the layer. the data would be in memory already so the access would be fast."
Ok, but the problem is the speed of getting it from the db into client memory the first time

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 11:18:35

well i don't know how slow is getting a few tens of records.... not so slow if you ask me....

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 11:23:42

not even in a table with 50 million records?
does paging work with GUID columns instead of identity?
Using paging, I would just have one monolithic history table instead of using partitions and views then?

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 11:28:50

yes. but have you thought about horizontaly partitioning your data?
instead of having one table with 50 million records have N viewes with 100k rows form that table.
you can use identity to split the data.
i suggest this because i have no idea how fast paging will work for rows 45.000.520 to 45.000.540, if you have a 20 rows pagesize...

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 11:31:26

"yes. but have you thought about horizontaly partitioning your data?
instead of having one table with 50 million records have N viewes with 100k rows form that table.
you can use identity to split the data."

Sorry, I dont understand that at all. Can you show me?
Looking back at my spec:
HisID (uniqueidentifier); HisTypeEnum (Tinyint); HisDate (datetime); HisDesc (varchar255); HisNotes (text)

Exactly what are you suggesting?

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 11:48:56

Do you mean have one table but lots of views that look at it in 100 000 chunks?

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 11:56:28

yup...

Go with the flow & have fun! Else fight the flow

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2005-04-08 : 12:12:38

1) Thats not "normal" horizontal partitioning is it? "Normal" is when you split a large table into smaller tables and then you use a view (that does a UNION) if you need to look at them all together.

2) Are you suggesting I use Views to partition the data and use paging?

3) What do you mean when you said "you can use identity to split the data."? Like I said, we will be going to SQL replication soon so thats why we chose GUID's over Identity. I can't really bring Identity back into the melting pot can I?

4) How would I deal with a situation where I went:
SELECT A.*, C.*
FROM viewMostRecentHistoryTable A
INNER JOIN LinkTable B ON B.HisID = A.HisID
INNER JOIN Company C ON C.CompanyID = B.LinkObjID

but it comes back with nothing because all this company's history is in the other views?

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-08 : 12:29:52

1. well no it isn't normal. but i have no idea how to call it otherwise...
2. yes. i'm still convinced that paging is a way to go here. i mean transfering thousands of rows to the client bottleneck the network.
3. i meant it as an example. you can use anykind of column to split the data, if it can be split on. i.e. datetime...
4. hmmm... i see what you mean.... emmm guys... any other ideas here???

Go with the flow & have fun! Else fight the flow

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-04-08 : 13:52:24

Without a clear definition of your logical model, it is hard to help verify if this is a good table design.

Why are you worrying about partioning your data and creating partioned views and all that? Determine and CLEARLY define your logical model, translate that into your physical table structure, be sure to add all indexes and constraints, and then try it out. Make sure the data fits, make sure it flows correctly and logically, and make sure that you have full data integrity.

Then, and only then, fill it up testing data and test various SELECT's that you will need. Determine the efficiency of each, and revisit your indexes and see what you can tweak. Also, do not forget to consider tweaking your SELECT's themselves to be sure they are written efficiently.

Finally, after all that is done, at the very END of that entire process, is where you should first start considering somewhat drastic measures if you need better performance -- i.e., partitioning tables or denormalizing your schema.

- Jeff

Previous Page&nsp; Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources