Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Designing a Forum

Author  Topic 

dbwilson4
Yak Posting Veteran

50 Posts

Posted - 2007-06-21 : 12:14:47
I'm writing up the DB architecture for a "heavy" usage forum.

The team has decided to create columns in the topics table for ViewCount, TopicCount, and PostCount.

I was suggesting to run a dynamic SQL count statement but they said that it would be too slow especially on say...a topic that has 2 million posts and heavy traffic etc. They said this EVEN with an index.

Is this correct?

Kristen
Test

22859 Posts

Posted - 2007-06-21 : 14:29:24
Yes, probably. Counting a lot of rows, even with an index, is going to be pretty linear, and probably Block other queries too.

Kristen
Go to Top of Page

dbwilson4
Yak Posting Veteran

50 Posts

Posted - 2007-06-21 : 15:42:05
I asked some other DBA pros and there seems to be VERY mixed reviews on this.

One guy told me that since the viewcount,topiccount,postcount have to be incremented every few seconds on a heavy forum, it makes more sense to just run a count statement instead. And he said with an index, it would be fast.
Another guy said the opposite.

I'm at a lost on what I should do?
The forum is expected to have about 10 million posts spread out within the forums. This means that the main page which displays all of the forums (boards) will need to run a correlated count query for each forum.

For an example:
SELECT *, (SELECT COUNT(1) FROM forum_posts WHERE forumid = forum.id AND topicid IS NOT NULL) as topiccount,(SELECT COUNT(1) FROM forum_posts WHERE forumid = forum.id AND topicid IS NULL) as postcount,
(SELECT COUNT(1) FROM forum_posts_views WHERE forumid = forum.id ) as viewcount FROM forum

I really like to run sql count statements instead of using count columns because with count columns, you would have to use alot of transactions and can never be garanteed that the count is EXACT.
If I use it on a VERY fast machine (quad-core 2.8ghz, 8gig ram, 15k rpm raid 0 dual harddrives) do you think performance will be ok on the select count statements?
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2007-06-21 : 16:28:19
The only way to know is test it.

If forumid, forum.id and topicid have good selectivity, then SQL Server should carve up those subqueries. You will also need to consider the write/read ratio.

Start with the normalised design first.

I recommend you create a view that abstracts the technique for returning the summaries. This should allow you to alter the db schema with minimal application impact.

DavidM

Production is just another testing cycle
Go to Top of Page

rudesyle
Posting Yak Master

110 Posts

Posted - 2007-06-21 : 16:30:22
quote:
Originally posted by dbwilson4

I asked some other DBA pros and there seems to be VERY mixed reviews on this.

One guy told me that since the viewcount,topiccount,postcount have to be incremented every few seconds on a heavy forum, it makes more sense to just run a count statement instead. And he said with an index, it would be fast.
Another guy said the opposite.

I'm at a lost on what I should do?
The forum is expected to have about 10 million posts spread out within the forums. This means that the main page which displays all of the forums (boards) will need to run a correlated count query for each forum.

For an example:
SELECT *, (SELECT COUNT(1) FROM forum_posts WHERE forumid = forum.id AND topicid IS NOT NULL) as topiccount,(SELECT COUNT(1) FROM forum_posts WHERE forumid = forum.id AND topicid IS NULL) as postcount,
(SELECT COUNT(1) FROM forum_posts_views WHERE forumid = forum.id ) as viewcount FROM forum

I really like to run sql count statements instead of using count columns because with count columns, you would have to use alot of transactions and can never be garanteed that the count is EXACT.
If I use it on a VERY fast machine (quad-core 2.8ghz, 8gig ram, 15k rpm raid 0 dual harddrives) do you think performance will be ok on the select count statements?



How often will these "counts" be referenced? I'm running a count(*) on a table with 5 million rows, it consistently takes 2 seconds, which is a while in web app world.

Can this content be deleted? if not, one option could be to have an identity column that you could just grab the next available row. This way no counting would be needed.

Other than that, I'd ask about what this would be used for (report, just eye candy for the user,etc...) and decide if it is needed, as it is resource intensive.
Go to Top of Page

dbwilson4
Yak Posting Veteran

50 Posts

Posted - 2007-06-22 : 01:40:26
Take a site like as large as myspace and it's myspace forums and with SEVERAL clustered DB servers (all with high optimized hardware), do you think SELECT COUNTs will perform poorly? The hardware i'm talking about are machines with 8 cpus, 16gigs of ram, raid 10 on 15k drives etc.

I need to use counts because the counts maybe different for each user per forum. Like some posts maybe hidden to other users while visible to some users and therefore the count should return the properly number etc. That is why I have to use SELECT counts instead of storing the counts in a column.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-06-22 : 03:04:38
"do you think SELECT COUNTs will perform poorly?"

Yes, but that isn't really the question!

The alternative is that you Update a record each time a new post is made (or one deleted) with a refreshed count (well, actually you just increment/decrement the value).

So for me the question would be "What is the ratio of Writes to Reads" and how long does a Trigger (say) take to do an Increment (multiplied by the number Writes) compared to how long a Count takes (multiplied by the number of Reads)

"some posts maybe hidden to other users "

Is that feature likely to be common? If its relatively rarely used you could just flag the thread record to indicate hidden posts and revert to a Count on those threads only.

But either way the only way you will know is to set up some data and run some timing tests

Kristen
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2007-06-22 : 09:40:42
Do the counts need to be 100% absolutely accurate upto the second??

For low score topics...I'd agree....but it there a material difference between a score of 200000 and a score of 2000001. Would approx answers suffice?....with 100% accurate answers refreshed on a nightly basis...ie let the "displayed count" values be static (near static) for the large topics.
Go to Top of Page

efscl
Starting Member

4 Posts

Posted - 2007-06-22 : 09:53:25
Out of topic: you are really sure self developing a forum app?

THere are lot of scripts around which you can self modifiy. For asp.vb (next time it should come up with .net) for example http://www.webwizguide.com/web_wiz_forums/default.asp is a really great and secure implementation and you have rights to modify anything. It as also an easy member API for integrating the app in your own project .

regards
Sebastian
Go to Top of Page

dbwilson4
Yak Posting Veteran

50 Posts

Posted - 2007-06-22 : 11:40:43
It's not exactly a forum per se. It's like a mini social networking app.

When a user uploads a video, he can mark it as public, private, friends, or preferred groups (multiple preferred groups can be created). So when a public user views the videos, he should see the count of "public videos" and when someone views the videos for a specific preferred group, they should see the count for the "preferred group" videos. Because there can be multiple preferred groups created/deleted by the user, having count columns is not an option. The only solution left is to do SELECT counts.

The counts don't have to be 100% accurate...in fact it won't be because the data is "micro-cached" within the application layer (for a good 2 seconds).

Are you guys sure that SELECT Counts will be slooow on that type of hardware (by slow, acceptable time is 0-3 seconds).
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-06-25 : 08:18:43
You need to try it. I think the point is that a COUNT(*) is going to pull stuff into memory in order to count it. If you have very large numbers of rows you are going to use up lots of memory getting that data from disk in order to perform the count.

If you had the count cached somewhere (even if its not usable for 100% of the queries) that will be faster.

Can SQL Server count 1,000,000 rows in under 3 seconds? Sure.

Can it do it for 1,000 of concurrently connected users? I doubt it. Well, I reckon its the same problem of any of the heavy Iron databases, with sufficient hardware it probably can, the issues for you is probably whether that's the best (which probably equates to "Most scalable") design.

Kristen
Go to Top of Page

rudesyle
Posting Yak Master

110 Posts

Posted - 2007-06-25 : 09:37:32
quote:
Originally posted by dbwilson4

It's not exactly a forum per se. It's like a mini social networking app.

When a user uploads a video, he can mark it as public, private, friends, or preferred groups (multiple preferred groups can be created). So when a public user views the videos, he should see the count of "public videos" and when someone views the videos for a specific preferred group, they should see the count for the "preferred group" videos. Because there can be multiple preferred groups created/deleted by the user, having count columns is not an option. The only solution left is to do SELECT counts.

The counts don't have to be 100% accurate...in fact it won't be because the data is "micro-cached" within the application layer (for a good 2 seconds).

Are you guys sure that SELECT Counts will be slooow on that type of hardware (by slow, acceptable time is 0-3 seconds).


Can you do any benchmarking now? Just use a stored procedure with 4 outputs. This will allow you to try the select count(*), and if you need to implement something else due to performance, you can do so without messing with the code that uses the procs.

Go to Top of Page
   

- Advertisement -