query performance strategy - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

SQL Server Administration (2005)

query performance strategy

Author

Topic

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-18 : 08:20:01

Hi,

I have just created a logging table that I anticipate to have 10's of millions of rows (maybe 100's of millions eventually).

Basically its a very basic, narrow table, we are using it to log hits on images for a webserver.

My question is that we want to run queries that show how many rows are shown per day etc, however we want to make sure these queries which we are anticipating to be very heavy, do not slow down the system.

I have been recommended to have a seperate database (mirror/replica) for reporting so that the performance of regular activity will not be affected.

I assume this means I would need another server for this other database?

I am thinking there are probably some alternative solutions to this as well. Getting a dedicated server just for these queries really isnt an option.

In order to improvement it is not a problem to make some sacrifices. For example, having the data update every 15 minutes is more than acceptable.

I see certain websites I use employ this strategy of making data update every 15 minutes, but I am unsure what is likely going on behind the scenes. Also the queries are lightening fast when run. I am thinking that they have some sort of table that is populated with some computed data, so its quick to query.

Any thoughts or suggestions to give me some direction, are very much appreciated !

thanks once again,
Mike123

pootle_flump

1064 Posts

Posted - 2008-04-18 : 09:52:33

Hi Mike

Can you show the queries you will use (particularly what you will filter on)? If this is so narrow then presumably their can't be many.... Also, are you pulling up singletons or ranges?

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2008-04-18 : 09:57:50

To be honest I really didn't understand all of your requirements but what I can say is that caching can be a very good friend. If you can grind your data once every 15 minutes and populate the results to a caching table it will be ALOT faster to query than to run the full query for each request to a website or whatever.

You could also give partitioning a glance, look up "partitioned tables" in BOL. It's only supported on enterprise edition but you can easily create your own custom version using views. Sharing the data load between tables can be a real performance booster if done right.

--
Lumbago

pootle_flump

1064 Posts

Posted - 2008-04-18 : 09:58:11

Sorry - reread your question.

CREATE TABLE dbo.logging
	(
		datetimeinserted	DATETIME NOT NULL			CONSTRAINT df_logging_datetimeinserted DEFAULT GETDATE()
		, uniqifier			INT IDENTITY(1, 1) NOT NULL
		, datacol			SMALLINT
		, CONSTRAINT pk_logging		PRIMARY KEY CLUSTERED (datetimeinserted, uniqifier) WITH (FILLFACTOR = 100)
	)
GO

Inserts are as fast as they can be (monotonically increasing clustered index - faaster than a heap). A range scan on date as you describe would be rapid - the query won't even hit the leaf level of the index.

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-18 : 15:34:05

quote:
Originally posted by pootle_flump

Hi Mike

Can you show the queries you will use (particularly what you will filter on)? If this is so narrow then presumably their can't be many.... Also, are you pulling up singletons or ranges?

Hi Pootle,

I want to run queries that show the usage history for each user. I have included the table below.

Basically we want to return result sets that have a row for each day, and show their counts for total images served (possibly another one that shows images served by domain etc) Basaically alot of counts(*) and group by's

I haven't written the query yet, if this is to general please let me know and I'll come up with something. Just going for a general strategy here, as I haven't really seen any strategies where people use tables to store precomputed stats, altho I am sure they exist.

For example, if the user serves 501,343 images yesterday, that never will never change. Perhaps we could store that # somewhere in a database. Perhaps we could even auto populate it after midnight.

Just an idea, really would love to hear feedback from whatever people have done in this type of situation before.

Thanks again! much appreciated
mike123

CREATE TABLE [dbo].[ImageLog](
[ImageLogID] [int] IDENTITY(1,1) NOT NULL,
[imageID] [int] NULL,
[DomainID] [int] NULL,
[ServeDate] [datetime] NOT NULL,
[UserID] [int] NULL,
[Status] [char](1) NULL
)

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-18 : 15:37:49

quote:
Originally posted by Lumbago

To be honest I really didn't understand all of your requirements but what I can say is that caching can be a very good friend. If you can grind your data once every 15 minutes and populate the results to a caching table it will be ALOT faster to query than to run the full query for each request to a website or whatever.

You could also give partitioning a glance, look up "partitioned tables" in BOL. It's only supported on enterprise edition but you can easily create your own custom version using views. Sharing the data load between tables can be a real performance booster if done right.

--
Lumbago

Hi Lumbago,

I havent used horizontal table partitioning before, but this looks like it could be the time to use it. Not sure what to expect, but definately will be looking into it.

I definately plan to cache on the web application side, so we minimize the amount of times the query's get run. But I am not sure how effective thats going to be.

I like the way this database caching sounds. How exactly would this work ??

In my above post, I mention an example. ->

If user 1 serves 324,234 images on 1/1/2008, we know the count of the that date for total images served. That will never increase or decrease, so we could store it in a table somewhere. Maybe even autopopulate this table after midnight, so it runs especially fast the next day. This could get pretty complicated for different types of queries tho, as the queries would get more complicated that just count(*) where userID =1

Any thoughts on this ? your input is greatly appreciated

Thanks very much!
mike123

pootle_flump

1064 Posts

Posted - 2008-04-19 : 15:14:02

You can absolutely prepopulate. The main problem is the "etc." bit. There is no such thing as an optimal indexing strategy without making consideration of the queries you will make on the table. No good having an index on a column that never appears in a where predicate (or join or group by or order by....).

It sounds like you will always be looking for an aggregation of data between x time and y time which, for starters, would mean that you would want the dateinserted column to be the leading column of either the clustered or covering index for that particular query. Would you be looking at one user at a time or all users? Grouping by user or lumping them all together? ... and so on

Pre-aggregating is a great strategy however if the date range you use for your queries might vary then you can only aggregate up to the smallest range (possibly by day and user) which is not, I suspect, especially useful. A datamart might be of use to you here but I'm afraid I have no experience of something like this.

HTH

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-19 : 17:52:00

Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.

E 12°55'05.25"
N 56°04'39.16"

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-25 : 12:28:36

quote:
Originally posted by Peso

Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.

E 12°55'05.25"
N 56°04'39.16"

this sounds like exactly what I need, thanks so much for the suggestion! I will try this out and let you know how it goes!

I'll try to post more pertinent info such as table design and queries etc, when I get to that point.

thanks again :D
mike123

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-25 : 12:29:23

quote:
Originally posted by pootle_flump

You can absolutely prepopulate. The main problem is the "etc." bit. There is no such thing as an optimal indexing strategy without making consideration of the queries you will make on the table. No good having an index on a column that never appears in a where predicate (or join or group by or order by....).

It sounds like you will always be looking for an aggregation of data between x time and y time which, for starters, would mean that you would want the dateinserted column to be the leading column of either the clustered or covering index for that particular query. Would you be looking at one user at a time or all users? Grouping by user or lumping them all together? ... and so on

Pre-aggregating is a great strategy however if the date range you use for your queries might vary then you can only aggregate up to the smallest range (possibly by day and user) which is not, I suspect, especially useful. A datamart might be of use to you here but I'm afraid I have no experience of something like this.

HTH

much appreciated, this helps alot.. thx!

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-04-25 : 14:52:34

in scenarios like this i'd suggest that you don't delete the source data, but transfer it to another database, backup that database and then drop that database. you never know when you might need the old data. and with a backup you can always restore it. if you can you don't even have to transfer the data to another db. just backup the current db and then delete the data from tables.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-04-25 : 16:00:43

quote:
Originally posted by spirit1

in scenarios like this i'd suggest that you don't delete the source data, but transfer it to another database, backup that database and then drop that database. you never know when you might need the old data. and with a backup you can always restore it. if you can you don't even have to transfer the data to another db. just backup the current db and then delete the data from tables.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com

excellent suggestion I definately plan to follow as well !

maybe I could have a "archive_DB" with just one table, that I keep appending to , as I delete data ?

thank you once again

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-05-13 : 19:39:57

quote:
Originally posted by Peso

Have a job that each day aggregates the previous days "clicks" and store in a history table.
Delete the records from the source.

And then have a clustered index over date for history table with fillfactor 100%.

I'm planning on integrating this now. I am wondering if there is anything you would do to prevent the job from slowing the system down?

Since each user has their own set of stats, perhaps it would be wise to query per user, and then do the insert per user. I could then loop thru the list of users, with a delay between loops.

I'm not really sure if this is necessary, perhaps I will have to see first. Any suggestions are greatly appreciated. I am not sure if there are any 'throttling' features in sql2005 but I am pretty sure not. I think maybe they are in 2008?

Thanks again,
Mike123

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources