SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 query performance strategy
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/18/2008 :  08:20:01  Show Profile  Reply with Quote
Hi,

I have just created a logging table that I anticipate to have 10's of millions of rows (maybe 100's of millions eventually).

Basically its a very basic, narrow table, we are using it to log hits on images for a webserver.

My question is that we want to run queries that show how many rows are shown per day etc, however we want to make sure these queries which we are anticipating to be very heavy, do not slow down the system.

I have been recommended to have a seperate database (mirror/replica) for reporting so that the performance of regular activity will not be affected.

I assume this means I would need another server for this other database?

I am thinking there are probably some alternative solutions to this as well. Getting a dedicated server just for these queries really isnt an option.

In order to improvement it is not a problem to make some sacrifices. For example, having the data update every 15 minutes is more than acceptable.

I see certain websites I use employ this strategy of making data update every 15 minutes, but I am unsure what is likely going on behind the scenes. Also the queries are lightening fast when run. I am thinking that they have some sort of table that is populated with some computed data, so its quick to query.


Any thoughts or suggestions to give me some direction, are very much appreciated !

thanks once again,
Mike123

pootle_flump
Flowing Fount of Yak Knowledge

United Kingdom
1064 Posts

Posted - 04/18/2008 :  09:52:33  Show Profile  Reply with Quote
Hi Mike

Can you show the queries you will use (particularly what you will filter on)? If this is so narrow then presumably their can't be many.... Also, are you pulling up singletons or ranges?
Go to Top of Page

Lumbago
Norsk Yak Master

Norway
3271 Posts

Posted - 04/18/2008 :  09:57:50  Show Profile  Reply with Quote
To be honest I really didn't understand all of your requirements but what I can say is that caching can be a very good friend. If you can grind your data once every 15 minutes and populate the results to a caching table it will be ALOT faster to query than to run the full query for each request to a website or whatever.

You could also give partitioning a glance, look up "partitioned tables" in BOL. It's only supported on enterprise edition but you can easily create your own custom version using views. Sharing the data load between tables can be a real performance booster if done right.



--
Lumbago

Edited by - Lumbago on 04/18/2008 09:58:33
Go to Top of Page

pootle_flump
Flowing Fount of Yak Knowledge

United Kingdom
1064 Posts

Posted - 04/18/2008 :  09:58:11  Show Profile  Reply with Quote
Sorry - reread your question.

CREATE TABLE dbo.logging
	(
		datetimeinserted	DATETIME NOT NULL			CONSTRAINT df_logging_datetimeinserted DEFAULT GETDATE()
		, uniqifier			INT IDENTITY(1, 1) NOT NULL
		, datacol			SMALLINT
		, CONSTRAINT pk_logging		PRIMARY KEY CLUSTERED (datetimeinserted, uniqifier) WITH (FILLFACTOR = 100)
	)
GO
Inserts are as fast as they can be (monotonically increasing clustered index - faaster than a heap). A range scan on date as you describe would be rapid - the query won't even hit the leaf level of the index.
Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/18/2008 :  15:34:05  Show Profile  Reply with Quote
quote:
Originally posted by pootle_flump

Hi Mike

Can you show the queries you will use (particularly what you will filter on)? If this is so narrow then presumably their can't be many.... Also, are you pulling up singletons or ranges?



Hi Pootle,

I want to run queries that show the usage history for each user. I have included the table below.

Basically we want to return result sets that have a row for each day, and show their counts for total images served (possibly another one that shows images served by domain etc) Basaically alot of counts(*) and group by's

I haven't written the query yet, if this is to general please let me know and I'll come up with something. Just going for a general strategy here, as I haven't really seen any strategies where people use tables to store precomputed stats, altho I am sure they exist.


For example, if the user serves 501,343 images yesterday, that never will never change. Perhaps we could store that # somewhere in a database. Perhaps we could even auto populate it after midnight.

Just an idea, really would love to hear feedback from whatever people have done in this type of situation before.

Thanks again! much appreciated
mike123






CREATE TABLE [dbo].[ImageLog](
[ImageLogID] [int] IDENTITY(1,1) NOT NULL,
[imageID] [int] NULL,
[DomainID] [int] NULL,
[ServeDate] [datetime] NOT NULL,
[UserID] [int] NULL,
[Status] [char](1) NULL
)







Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/18/2008 :  15:37:49  Show Profile  Reply with Quote
quote:
Originally posted by Lumbago

To be honest I really didn't understand all of your requirements but what I can say is that caching can be a very good friend. If you can grind your data once every 15 minutes and populate the results to a caching table it will be ALOT faster to query than to run the full query for each request to a website or whatever.

You could also give partitioning a glance, look up "partitioned tables" in BOL. It's only supported on enterprise edition but you can easily create your own custom version using views. Sharing the data load between tables can be a real performance booster if done right.



--
Lumbago



Hi Lumbago,

I havent used horizontal table partitioning before, but this looks like it could be the time to use it. Not sure what to expect, but definately will be looking into it.

I definately plan to cache on the web application side, so we minimize the amount of times the query's get run. But I am not sure how effective thats going to be.

I like the way this database caching sounds. How exactly would this work ??

In my above post, I mention an example. ->

If user 1 serves 324,234 images on 1/1/2008, we know the count of the that date for total images served. That will never increase or decrease, so we could store it in a table somewhere. Maybe even autopopulate this table after midnight, so it runs especially fast the next day. This could get pretty complicated for different types of queries tho, as the queries would get more complicated that just count(*) where userID =1

Any thoughts on this ? your input is greatly appreciated

Thanks very much!
mike123

Edited by - mike123 on 04/18/2008 15:38:18
Go to Top of Page

pootle_flump
Flowing Fount of Yak Knowledge

United Kingdom
1064 Posts

Posted - 04/19/2008 :  15:14:02  Show Profile  Reply with Quote
You can absolutely prepopulate. The main problem is the "etc." bit. There is no such thing as an optimal indexing strategy without making consideration of the queries you will make on the table. No good having an index on a column that never appears in a where predicate (or join or group by or order by....).

It sounds like you will always be looking for an aggregation of data between x time and y time which, for starters, would mean that you would want the dateinserted column to be the leading column of either the clustered or covering index for that particular query. Would you be looking at one user at a time or all users? Grouping by user or lumping them all together? ... and so on

Pre-aggregating is a great strategy however if the date range you use for your queries might vary then you can only aggregate up to the smallest range (possibly by day and user) which is not, I suspect, especially useful. A datamart might be of use to you here but I'm afraid I have no experience of something like this.

HTH
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30102 Posts

Posted - 04/19/2008 :  17:52:00  Show Profile  Visit SwePeso's Homepage  Reply with Quote
Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/25/2008 :  12:28:36  Show Profile  Reply with Quote
quote:
Originally posted by Peso

Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.



E 12°55'05.25"
N 56°04'39.16"




this sounds like exactly what I need, thanks so much for the suggestion! I will try this out and let you know how it goes!

I'll try to post more pertinent info such as table design and queries etc, when I get to that point.

thanks again :D
mike123



Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/25/2008 :  12:29:23  Show Profile  Reply with Quote
quote:
Originally posted by pootle_flump

You can absolutely prepopulate. The main problem is the "etc." bit. There is no such thing as an optimal indexing strategy without making consideration of the queries you will make on the table. No good having an index on a column that never appears in a where predicate (or join or group by or order by....).

It sounds like you will always be looking for an aggregation of data between x time and y time which, for starters, would mean that you would want the dateinserted column to be the leading column of either the clustered or covering index for that particular query. Would you be looking at one user at a time or all users? Grouping by user or lumping them all together? ... and so on

Pre-aggregating is a great strategy however if the date range you use for your queries might vary then you can only aggregate up to the smallest range (possibly by day and user) which is not, I suspect, especially useful. A datamart might be of use to you here but I'm afraid I have no experience of something like this.

HTH



much appreciated, this helps alot.. thx!
Go to Top of Page

spirit1
Cybernetic Yak Master

Slovenia
11750 Posts

Posted - 04/25/2008 :  14:52:34  Show Profile  Visit spirit1's Homepage  Reply with Quote
in scenarios like this i'd suggest that you don't delete the source data, but transfer it to another database, backup that database and then drop that database. you never know when you might need the old data. and with a backup you can always restore it. if you can you don't even have to transfer the data to another db. just backup the current db and then delete the data from tables.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com
Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 04/25/2008 :  16:00:43  Show Profile  Reply with Quote
quote:
Originally posted by spirit1

in scenarios like this i'd suggest that you don't delete the source data, but transfer it to another database, backup that database and then drop that database. you never know when you might need the old data. and with a backup you can always restore it. if you can you don't even have to transfer the data to another db. just backup the current db and then delete the data from tables.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com



excellent suggestion I definately plan to follow as well !

maybe I could have a "archive_DB" with just one table, that I keep appending to , as I delete data ?

thank you once again
Go to Top of Page

mike123
Flowing Fount of Yak Knowledge

1462 Posts

Posted - 05/13/2008 :  19:39:57  Show Profile  Reply with Quote
quote:
Originally posted by Peso

Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.



I'm planning on integrating this now. I am wondering if there is anything you would do to prevent the job from slowing the system down?

Since each user has their own set of stats, perhaps it would be wise to query per user, and then do the insert per user. I could then loop thru the list of users, with a delay between loops.

I'm not really sure if this is necessary, perhaps I will have to see first. Any suggestions are greatly appreciated. I am not sure if there are any 'throttling' features in sql2005 but I am pretty sure not. I think maybe they are in 2008?

Thanks again,
Mike123
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.12 seconds. Powered By: Snitz Forums 2000