How to design the history table to be more efficie

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

How to design the history table to be more efficie

Author

Topic

hujiao
Starting Member

8 Posts

Posted - 2007-02-08 : 22:16:01

I am running a website of crossword puzzle and Sudoku games. The website is designed to be:

1. There are 20-30 games onlines each day.
2. Every registered user could play and submit the game to win scores.
3. For each game, every registered user could get the score for ONLY one time. i.e., No score will be calculated if the user had finished the game before.
4. To avoid wasting time on a game finished before, user will be notified with hint message in the page when enter a already finished game.

The current solution is:
3 tables are designed for the functions mentioned above.
Table A: UserTable --storing usering information, userid
Table B: GameList --storing all the game information.
Related fields:
GameID primary key
FinshiedTimes recording how many times the game has been finished
Table C: FinishHistory --storing who and when finished the game
Related fields:
GameID ID of the game
UserID ID of the user
FinishedDate the time when the game was finshied
PS: Fields listed above are only related ones, not the complete structure.

Each time when user enters the game, the program will read Table B(GameList), listing all the available game and the times games have been finished. User could then choose a desired game to play.

When user clicks the link and enter a page showing the detail content of the game, the program will read Table C(FinishHistory) to check whether user has finished this game before. If yes, hint message will be shown in the page.

When user finishes the game and submit, the program will again read Table C(FinishHistory) to check whether user has finished this game before. If yes, hint message will be shown in the page. If no, user will get the score.

Existing Problems:
With the increase of game and users, the capacity of Table C(FinishHistory) grows rapidly. And each time when a game is loaded, the Table C will be loaded to check, and when a game is submitted, the Table C will be loaded to check again. So it is only a time question to find out Table C to become a bottleneck.

Does any one here have any good suggestions to change / re-invent a new structure or design to avoid this bottleneck?

Kristen
Test

22859 Posts

Posted - 2007-02-09 : 01:08:31

"the Table C will be loaded to check"

The way I read that it sounds as though are you are loading the whole table to check for a single value.

If this is the case then you should use SQL to just check for the existence of a value:

SELECT 1 FROM TableC WHERE GameID = 123 AND UserID = 456

"the program will again read Table C(FinishHistory) to check whether user has finished this game before"

similarly:

INSERT TableC (GameID, UserID)
SELECT 123, 456
WHERE NOT EXISTS
(
SELECT * FROM WHERE GameID = 123 AND UserID = 456
)
SELECT [RowCount] = @@ROWCOUNT

this will return 1 or 0 depending whether a row was inserted, or already existed. If 0 you can display a message to the user

This will work fine with millions or rows in TableC

Kristen

hujiao
Starting Member

8 Posts

Posted - 2007-02-09 : 01:30:23

Thank Kristen for your reply.
It seems that i should not use the word "load". :)

so far, i am using statement like SELECT 1 FROM TableC WHERE GameID = 123 AND UserID = 456 to find whether it is finished before.

But i am worry about the growing speed of the TableC, with thousands games and thousands of users, it will be not long to find the TableC has millions of rows. Although i have built compound indexes on GameID and UserID, but i still think the bottleneck will appear some day.

Is there any alternative solutions to build the tables?

I was suggested to add an extra field in Table B, holding all the finished userID, which likes: "User001|User005|User007"

And when a game is loaded by a user, the program could match the user's id with this field to find whether the user has played this game before.

But this will need a big field to hold all the possible userIDs. say I have 10000 users, and the length of a unique user ID is 6 chars, so this field should be designed to be able to hold (6+1)*10000=60000, which is quite huge, right? and there is also potential lock confliction when serveral users finish a game at the same time.

Kristen
Test

22859 Posts

Posted - 2007-02-09 : 01:52:12

"Although i have built compound indexes on GameID and UserID, but i still think the bottleneck will appear some day."

I very much doubt it! If you have an index on Col1 + Col2, and that combination is unique (i.e. you have a Unique index), and your query says

WHERE Col1 = xxx and Col2 = yyy

(i.e. you are using EQUALS and not LIKE, or something similarly in-exact) then SQL will go straight to the record, no messing around. It will be as fast as a fast thing!

(You may need to make sure that Statistics are up to date on the table, but you need to have housekeeping procedures that do that anyway)

It would help if you delete all the entries when a Game is no longer available - but that is just to keep a lid on it. Maybe you would delete results after a year or somesuch - i.e. people could replay the game after a year ... but that's housekeeping again!

"I was suggested to add an extra field in Table B, holding all the finished userID, which likes: "User001|User005|User007""

Crap idea - excuse my French! You've got a database, make a child table, that's how relational databases work so efficiently.

Kristen

Kristen
Test

22859 Posts

Posted - 2007-02-09 : 01:53:40

"there is also potential lock confliction when serveral users finish a game at the same time"

Sorry, missed that bit.

That's probably true of the "User001|User005|User007" solution

All other things being equal you will not get any locks with an atomic statement like:

INSERT TableC (GameID, UserID)
SELECT 123, 456
WHERE NOT EXISTS
(
SELECT * FROM WHERE GameID = 123 AND UserID = 456
)
SELECT [RowCount] = @@ROWCOUNT

Kristen

hujiao
Starting Member

8 Posts

Posted - 2007-02-09 : 04:11:17

Hi Kristen,

"then SQL will go straight to the record, no messing around. It will be as fast as a fast thing!"
You mean it will go straight to the record, no matter how large the table is? :) i think at least the speed should be some slower when the table has millions rows or serveral millions rows, right? :)

Housekeeping...hmm...the only question is I can not put the game offline or allow people to re-play and re-get the score after 1 year. :(

hujiao
Starting Member

8 Posts

Posted - 2007-02-09 : 04:18:21

so far, i have 2000 games listed online and 110k lines in TableC, and I am sure the size will be doubled when i have 4000 games.

It is not surprising that the increasing speed of TableC is much higher than TableB and TableA. And i am looking for a way to have a new structure / design for TableC ( or anything else where the the history information are stored) whose increasing speed could match with TableB's speed.

Kristen
Test

22859 Posts

Posted - 2007-02-09 : 04:52:14

"You mean it will go straight to the record, no matter how large the table is? :) i think at least the speed should be some slower when the table has millions rows or serveral millions rows, right? :)"

Assuming you are not familiar with B-Trees it works something like this:

The first index page says "For A-F see page 2, for G-N see page 3 ..."

and then on, say, Page 2 is says "For A to BFXG see page 4, for BFXH - CKHD see page 5 ..."

and so on. To get a record requires, say, 3 blocks/levels to be checked.

The big "win" of this is that every additional "level" in the index increases the number of rows that can be "reached" exponentially.

You really do get to millions-and-millions of rows with very few index levels. So, (in simplistic terms) doubling the number of rows makes no performance difference - until you get to the point where there are too many rows and then another "level" is added to the index - and then every look-up starts taking (say) 4 reads, instead of 3.

One thing which will make a difference is the amount of memory. If the whole database fits into memory obviously it will be quicker than if only 10% of it will fit into memory. So in time that will be a factor.

But SQL Server is designed to handle these challenges very well, so I don't think you have to worry about it. Our application has a logging table which has about 10,000,000 rows added A DAY. Every INSERT has has least one UPDATE and several SELECTs (based on PK). We don't have any performance issues with this table.

"Housekeeping...hmm...the only question is I can not put the game offline or allow people to re-play and re-get the score after 1 year. :("

All the housekeeping can be done "on-line" (i.e. whilst users are connected), although it is best to schedule it for the quieter parts of the day/week if possible. Our applications are all 24/7 ...

I don't think you need to change your design, but an alternative would be to have a "bit" field in the User Table (or another table with the UserID as the PK) where each bit represents a game, and you add "bits" for any new game that a user has played. But the trouble with this is that you will have to retrieve the whole of the bit field from SQL Server to check it in your application - so the question then becomes "Can SQL Server check TableC quicker than it can return this really big field about the User's history to my application" - I would put my money on SQL Server being quicker!

Kristen

hujiao
Starting Member

8 Posts

Posted - 2007-02-12 : 20:44:14

Hi Kristen, really appreciate for your suggestions.
To be frankly, the database is built on Access, the site is using ASP+Access. I am runing the website on my own resources. ^-^ so far i don't have any plan to invest it to have a quicker SQL server. so i have no way but to find the max potential of Access db. :)

anyway, with your help, i am now sure my design should be ok with current solution, and should be more ok in SQL server. :) i would turn to find how to make the housekeeping more reasonalble, for the site users.

thanks again for your kindly help. ^-^

Kristen
Test

22859 Posts

Posted - 2007-02-13 : 02:17:47

"the database is built on Access ..."

Access will be using the same concept of B-Trees for its indexes - so will also be able to scale up very high numbers of records with little increase in retrieval times.

Kristen

hujiao
Starting Member

8 Posts

Posted - 2007-02-14 : 01:54:45

Yes, i know that. but the performance would dramatically go down in case the table owns millions of records. And according to the increasing speed, this will have happen in the short future. :)

Kristen
Test

22859 Posts

Posted - 2007-02-14 : 13:31:49

Perhaps you need a plan to move to SQL Server then?

hujiao
Starting Member

8 Posts

Posted - 2007-02-14 : 22:07:35

yes, as long as i have money. :) what i am now doing is to delay the time when i have to switch to SQL server as later as i can.

Kristen, how about this idea?
I split the TableC into serveral tables, TableC-1 holds all the history of Game 1 to Game 1000; TableC-2 holds all the history of Game starting from 1001 to 2000. This could decrease the size of each table. And it seems should have a better performance.
How do u think?

Kristen
Test

22859 Posts

Posted - 2007-02-15 : 06:08:36

"yes, as long as i have money"

Indeed, although the "junior" version of SQL Server has been free - since SQL 7 I think. But I don't think the license on that allows unlimited Web access.

Your multiple-table idea is fine, particularly if you are generating the SQL Queries dynamically in your application - you can just adjust the table you search in the SQL command string that you create.

You'll need something to automatically crate the new table when the N,000th game is added.

If you move to SQL Server you could re-combine all the tables into one (if that is fast enough) and just create a VIEW for each of the TableC-1 sub-sets - so your application will still be able to see the data as it always did!

In Access you might need to query a DIFFERENT database - I think Access's performance problems come as much from when the size of its file gets big as the number of records in a table. (But with your plan I presume you could as easily reference "TableC" in "Database-1", rather than "TableC-1" in a single database?)

I shouldn't worry about it until the performance bottleneck hits, but having a plan for solving it would be good.

Kristen

hujiao
Starting Member

8 Posts

Posted - 2007-02-26 : 20:42:37

Kristen, Really appreciate for your replies. it does me a good help. I will take the multiple-table solution in my website, and plan to move to SQL server if the site really goes up.

Thank u again. :) cheers, and happy for the Chinese New Year. ^-^

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources