Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2003-06-02 : 07:35:27
|
| Anthony writes "I was interested to read the post/discussion on horizontal partitioning of tables, as I considering doing it on a big scale. The application involved helps people prepare for exams and records every question answered by the users. Most people answer 5000 to 10000 questions over a 6 month period and there are currently around 10000 users at any one time (not necessarily using the system concurrently). So, as a single entity the question tracking table is very big and is constantly being written to by a number of concurrent users. There is very little need for querying data from multiple users so I am considering using separate tables for each user. What I'd like to know is:- will performance suffer if there are a very large number of tables in a database (10000 to 20000)?- what is the effect on performance of using a view (as described in the previous post on this topic) vis-a-vis querying the partitioned tables directly? (in my application, it would be as easy to use a session variable to identify the user's tracking table, as it is to put the where clause into a query on the view)Thanks" |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-06-02 : 08:04:35
|
quote: There is very little need for querying data from multiple users so I am considering using separate tables for each user.
Do yourself a HUGE favor and stop considering that right now. While SQL Server can handle thousands of tables in a database, you will not see an appreciable (if any) improvement in performance with that many tables (probably the reverse), and you'll encounter far more problems trying to find the right table. And while you say that you don't need to query for multiple users, you WILL at some point, and you'll never be able to UNION 10,000 tables together effectively.While you might be looking at 100 million rows worth of data, if the table is fairly narrow (10 columns or less) you will probably be better off using a single table. If you do partition the data, do it based on blocks of user ID's (NOT one userID per table) I don't know how you have your questions and answers tables structure, but you don't need anything more than the following:CREATE Table UserAnswers (UserID int NOT NULL, QuestionID int NOT NULL, AnswerID int NOT NULL)This assumes you have Users, Questions, and Answers tables that contain their corresponding data. The UserAnswers table is very narrow and can accommodate 100 million rows or more very nicely. If you create a clustered index on UserID, you'll get excellent performance without the hassle of splitting into multiple tables.Also, horizontal partitioning is not going to be very effective if all the tables are on the same filegroup, and/or if multiple filegroups are on the same physical disk. You need to have separate disks and filegroups in order to gain the most benefit from it. |
 |
|
|
amiller
Starting Member
4 Posts |
Posted - 2003-06-02 : 09:31:48
|
| Hi, thanks for the advice.The idea of having 10000+ tables did seem a little crazy, which why I wanted to sound it out on this board. My current table is much as you describe below (it also has a field for time taken to answer and a date/time field). There aren't any problems with select queries on this table. Where we've encountered issues is with the insert queries. At busy times we can end up having around 5000 inserts per minute into this same table. I don't know a lot about the internal working of sql server, but i assume there must be some form of locking taking place for insert operations. One other thing we have in the table at the moment is a counter (autonumber) field as the primary key. Would this field increase the locking problem for inserts, since the sql server has to keep ensuring its unique? |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-06-02 : 09:56:16
|
| Why do you have an identity column? Just to have a unique Id per row? Is there something else that will uniquley identify each row?Reason for asking this is because if you can, a partitioned view (a series of tables unioned together) might help.Placing each table on a separate drive, and make the name of the view the same as the existing table.Brett8-) |
 |
|
|
amiller
Starting Member
4 Posts |
Posted - 2003-06-02 : 10:38:18
|
| Yeah, the identity column is just so that there's a simple unique id for each record. Without this column the primary key would have to be a combination of userId, questionId and the date/time field. Well thinking about it, userId and date/time should be sufficient.Maybe what I need is some sort of in between solution then. Not one table per user, and not one table for all. I guess I could partition it into 5-10 tables and try to create some process to distribute users from different parts of the world evenly across the tables to ensure the load is spread relatively evenly across them at all times. Assuming that its concurrent inserts to the same table that are causing poor performance this might help I guess. On the other hand, if its simply down the size of the table I guess I'd be better to create archive tables every 1,000,000 records or so and create a partitioned view across these??!? |
 |
|
|
|
|
|
|
|