Summarie voting data by day, week, month, year ..

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Summarie voting data by day, week, month, year ..

Author

Topic

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2003-07-09 : 15:09:31

Hi folks,

I'm working on a site where you can rate articles on a scale from 1 to 10. What I want to do is to be able and provide users the ability to view the following:

* show top 10 articles for today (only for today)
* show top 10 articles for a given week (past weeks in current month and current week)
* show top 10 articles for a given month (past months in current year and current month)
* show top 10 articles for a given year (past year in current decade and current year).

Any ideas on the best way to structure my database, quereies, views, etc... in order to accomplish in the most efficient way???

So far I put all the rating data (total votes, total ranking points, etc...) into a separate table for each day; which allows me to be able to perform the above *if i keep the data around forever*. THE PROBLEM, what if there are thousands of articles? If so, this particular table could get rather large and slow (if not downright unusable).

Thanks - wg

X002548
Not Just a Number

15586 Posts

Posted - 2003-07-09 : 16:22:24

Wheh...did I get carried away with this one...

TALLY HO!





USE Northwind
GO

CREATE TABLE myTallyArticle (col1 int)
GO

SET NOCOUNT ON
DECLARE @ctr int
SELECT @ctr = 1 

WHILE @ctr < 1000
	BEGIN 
		INSERT INTO myTallyArticle(col1) SELECT @ctr
		SELECT @ctr = @ctr + 1  
	END
SET NOCOUNT OFF
GO

CREATE TABLE myTallyRating (col1 int)
GO

INSERT INTO myTallyRating (col1) 
SELECT 1 UNION ALL SELECT 2  UNION ALL SELECT 3 UNION ALL SELECT 4  UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7  UNION ALL SELECT 8 UNION ALL SELECT 9  UNION ALL SELECT 10
GO


CREATE TABLE myTable99 (ArticleId Int, Rating int, RatingDate datetime)
GO

SET NOCOUNT ON
DECLARE @ctr int, @article int, @rating int

SELECT @ctr = 1

--TODAY

WHILE @ctr < 100
	BEGIN
		SELECT  TOP 1 @Article = col1 FROM myTallyArticle Order by NEWID()

		-- Couldn't get this to work, always reurns 1
		-- SELECT TOP 1 col1 FROM myTallyRating Order by NEWID()

		SELECT @Rating = CASE WHEN CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1 > 10 THEN 10
	    		              ELSE CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1
       		                 END
		INSERT INTO myTable99 (ArticleId, Rating, RatingDate) SELECT @Article, @Rating, GetDate()
		
		SELECT @ctr = @ctr + 1
	END	

--WEEK

WHILE @ctr < 1000
	BEGIN
		SELECT TOP 1 @Article = col1 FROM myTallyArticle Order by NEWID()

		-- Couldn't get this to work, always reurns 1
		-- SELECT TOP 1 col1 FROM myTallyRating Order by NEWID()

		SELECT @Rating = CASE WHEN CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1 > 10 THEN 10
	    		              ELSE CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1
       		                 END
		INSERT INTO myTable99 (ArticleId, Rating, RatingDate) SELECT @Article, @Rating, DATEADD(wk, -1,GetDate())
		
		SELECT @ctr = @ctr + 1
	END	
--MONTH

WHILE @ctr < 10000
	BEGIN
		SELECT TOP 1 @Article = col1 FROM myTallyArticle Order by NEWID()

		-- Couldn't get this to work, always reurns 1
		-- SELECT TOP 1 col1 FROM myTallyRating Order by NEWID()

		SELECT @Rating = CASE WHEN CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1 > 10 THEN 10
	    		              ELSE CONVERT(int,SUBSTRING(RIGHT(CONVERT(varchar(26),GetDate(),126),3),3,1))+1
       		                 END
		INSERT INTO myTable99 (ArticleId, Rating, RatingDate) SELECT @Article, @Rating, DATEADD(mm, -1,GetDate()) 
		
		SELECT @ctr = @ctr + 1
	END	
SET NOCOUNT ON
GO


  SELECT TOP 10
	 ArticleId, Sum(Rating)/Count(Rating) As Avg_Rating, Count(Rating) As Num_Ratings
    FROM myTable99
   WHERE DATEDIFF(d,RatingDate,GetDate()) = 0 
GROUP BY ArticleId 
ORDER BY 2 DESC

  SELECT TOP 10
	 ArticleId, Sum(Rating)/Count(Rating) As Avg_Rating, Count(Rating) As Num_Ratings
    FROM myTable99
   WHERE DATEDIFF(w,RatingDate,GetDate()) <= 1 
GROUP BY ArticleId 
ORDER BY 2 DESC

  SELECT TOP 10
	 ArticleId, Sum(Rating)/Count(Rating) As Avg_Rating, Count(Rating) As Num_Ratings
    FROM myTable99
   WHERE DATEDIFF(m,RatingDate,GetDate()) <= 1 
GROUP BY ArticleId 
ORDER BY 2 DESC
GO

DROP TABLE myTallyArticle
GO

DROP TABLE myTableRating
GO

DROP TABLE myTable99
GO

Brett

8-)

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2003-07-09 : 18:21:08

What about past weeks, months and years? If lets say there are over 500k articles ... I don't want to keep the data required for the summaries any longer than necessary.

One thought I had ...

>>Envision the following tables:

* DailyAcitivty(date, articleid, totalvotingpoints, totalvotes, avg)
* WeeklyActivity(date, articleid, totalvotingpoints, totalvotes, avg)
* YearlyActivity(date, articleid, totalvotingpoints, totalvotes, avg)
* RatingSummary(date, articleid, totalvotingpoints, totalvotes, avg, summarytype [week, month, year]

>>And here are the steps:

1. Every time a user rates an article the vote is added to a DailyActivity table that maintains the avg, total votes, etc... for the article. This table has one row for every day/articleid.
[this allows me to get top 20 for current day]

2. Have a script that runs nightly that does the following:
a. Add all records in DailyActivity into a table called WeeklyActivity (pk = week/articleid) for the prior day.
[this allows me to get top 20 for current week]

b. Add all records in DailyActivity into a table called
MonthlyActivity (pk= month/ariticleid) for the prior day.
[this allows me to get top 20 for current month]

c. Add all records in DailyActivity into a table called YearlyActivity (pk= year/articleid) for the priorday.
[this allows me to get top 20 for current year]

d. Delete all records in DailyActivity for the prior day cuz they are no longer needed.
[allows me to reduce the # of records in that table]

e. If the prior day is end of the week, get *only* the top 20 from WeekActivity for the prior week and store them in the RatingSummary table.
[allows me to quickly get what I want for past weeks]

f. Delete all records in WeeklyActivity for the prior week since they have been summarized and are no longer needed.
[allows me to reduce # of recods in that table]

g. If the prior day was an end of the month, get *only* top 20 from MonthlyActivity for the prior month and store them in the RatingSummary table.
[allows me to quickly get top 20 for past months]

h. Delete all records in MonthlyActivity for the prior month since they have been summarized and are no longer needed.
[for reducing size of table]

i. If teh prior day was an end of the year, get *only* top 20 from YearlyActivity for the prior year and store them in the RatingSummaryTable.
[allows me to quickly get past years summaries]

j. Delete all records in YearlyActivity for the prior year since they have been summarized and are no longer needed.
[to reduce size of table]

k. Delete any weekly data in RatingSummary that is > 4 weeks old since it is not needed (only need last 4 weeks)
[to reduce size of table; I only need to persist last 4 weeks]

l. Delete any monthly data in RatingSummary that is > 12 months old since it is not needed (only want to keep last 12 months)
[to reduce size of table; I only need to persist last 12 months]

m. Delete any yearly data in RatingSummary that is > 10 years old since it is not needed (only want to keep last 10 years)
[to reduce size of table; I only need to persist last 10 years]

n. Perhaps create a view that aggregates top 20 information for current day, week, month, year AND past weeks, months, years to simplify lookup

Is this a good approach? Any recommendations on a different/better way to accomplish this? Basically I want to make summary look ups (1) fast and (2) I want to keep the table sizes as small as possible (given an article count of 500k+ this is important).

Any thoughts???

Thanks - wg

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-07-10 : 09:09:59

You are overemphasizing the supposed negative effects of "large" rowset sizes. 500K+ rows is hardly anything when the rows are narrow (i.e. a relatively low # of bytes / row).

If you had 500 million rows, that would be something else.

Rather than engineer yourself into paralysis, try a simple schema with no denormalization and write the queries you need. If the performance is inadequate, come back to us.

Jonathan
{0}

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-07-10 : 10:02:30

You really ought to call yourself 'SQL Scrub'.

How you can find the energy to write so much code for other people I don't know. Do you work part-time or is it just a compulsive desire to write SQL?

Nice to see you using Tally tables though.

Regards.

Shadow to Light

X002548
Not Just a Number

15586 Posts

Posted - 2003-07-10 : 10:06:07

I'd keep it simple...The table I gave you is all you should need.

If you add all of what you discuss you'll complicate things and logically it may be hard to follow.

Also, the things you are discussing are more in the realm of data warehousing where all "facts and dimensions" are calculated and stored and never change...which is not a bad thing.

Do you have Enterprise Edition?

Brett

8-)

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2003-07-10 : 10:44:21

Yes I have enterprise edition ... but I have zero experience with data warehousing. Any help on where to start and/or accomplish what I'm looking to do via that route would be very appreciated.

thanks - wg

quote:

I'd keep it simple...The table I gave you is all you should need.

If you add all of what you discuss you'll complicate things and logically it may be hard to follow.

Also, the things you are discussing are more in the realm of data warehousing where all "facts and dimensions" are calculated and stored and never change...which is not a bad thing.

Do you have Enterprise Edition?

Brett

8-)

X002548
Not Just a Number

15586 Posts

Posted - 2003-07-10 : 11:03:58

quote:

You really ought to call yourself 'SQL Scrub'.

How you can find the energy to write so much code for other people I don't know. Do you work part-time or is it just a compulsive desire to write SQL?

Nice to see you using Tally tables though.

Regards.

Shadow to Light

Anything to practice something new...tallys are a new concept for me...

Hey and multi-tasking is the way to go....

Anyway, anything I practice and learn here, I eventually employee in some other prod environment...

Building stuff as we speak....

PS Alot of it if you noticed is just cut and pasted several times..I really don't like to type (too) much.

Brett

8-)

Edited by - x002548 on 07/10/2003 11:11:47

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2003-07-10 : 11:57:13

Just out of curiousity ... what is considered a *large* table? For example, given the proposed solution in the second post on this thread ... at what point would the myTable99 table be considered too big?

Thanks - wg

quote:

You are overemphasizing the supposed negative effects of "large" rowset sizes. 500K+ rows is hardly anything when the rows are narrow (i.e. a relatively low # of bytes / row).

If you had 500 million rows, that would be something else.

Rather than engineer yourself into paralysis, try a simple schema with no denormalization and write the queries you need. If the performance is inadequate, come back to us.

Jonathan
{0}

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-07-10 : 12:12:23

Bear in mind it depends substantially on the indexing, but if you're doing table scans then what you are most interested in is the # of pages per row.

A data page contains 8060 bytes. The row size of myTable99 is 16 bytes ( 2 ints and a datetime ). So you can fit 500 rows per page which is very dense. 500,000 rows is only 1000 data pages - a pittance for the I/O subsystem to return, and that's assuming none of it was in memory already.

I would say "large" for such a narrow table would start at 10 million rows if you're running this on a desktop with a single IDE drive.

Jonathan
{0}

X002548
Not Just a Number

15586 Posts

Posted - 2003-07-10 : 12:44:04

WG,

Which is about 20,000 pages.

Jonathan, correct me where I'm wrong, but what he is saying is that you need to understand how SQL is going to get at the data.

In this example all of this is going to scan because you need to interogate every row.

The database is split into data pages and indexes (usually).

Think of them as separate "files" for lack of a better term.

A data page usually contains rows that are much larger, and hence more pages to read.

And index will contain only the columns you want, and so be much smaller. So in this case myTable99 looks more like an index that a table (even though it is a table).

So if you place an index on a column, and it contains all of the columns required by your SQL statement, he will only use the index (very effecient)

If you have a where clause that can use the Index, he will find pointers to the data pages and only get the pages you need (good performance)

If SQL can't figure things out, he'll scan ALL of the data pages (poor performance, but sometimes unavoidable)

So there answer is, (and I love this answer), it depends.

You need to be able to forecast growth of your db, and understand what data you need.

For example in a ledger reporting system, the requirement is to keep 13 rolloing months of data...I just archive the rest to history table that are not being accessed by the application.

Just in case...(they always want something more...)

If you need anymore help with design, let us know, but I'd say stick with a normalized database, and don't worry about a warehouse unless you're going terabyte city...

Brett

8-)

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-07-10 : 12:59:28

I'm not sure if you're addressing the tutorial to him or me ...

As I said earlier in the thread, my recommendation is to start with a normalized schema.

Jonathan
{0}

X002548
Not Just a Number

15586 Posts

Posted - 2003-07-10 : 14:57:26

Jonathan,

He signs himself WG...which whom I addressed it to...

Got a feeling you don't need a tutorial..

but he asked the question, and you brought up access methods, whcih I thought would blow his/her mind

And yes Normalize (in which case it doesn't work).

Always start there.

PS Did I mis-state anything there?

Brett

8-)

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources