Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Kristen
Test
22859 Posts |
Posted - 2010-01-12 : 03:52:13
|
| Hi guys,Our application has a logging table recording activity about user's web page views, and the database procs that are called. It gets several million INSERTs a day, and each INSERT is followed by an update (with Error / Success Code and End Time).These logging rows are kept for about 7 days - so we also have a housekeeping job that deletes several million rows each day.We have a dozen or so databases on a server - one per client. Each has the same logging action.The logging data accounts for at least 50% of the client database storage.The logging data is seldom read (we have a Stats report that aggregates it once a day, other than that we only look at it when investigating a problem). So essentially write-only :)Here's my question:What are the implications of storing this logging data in a separate database - one per server (I'll add a "SOURCE" column to indicate which database it came from).Are there any performance implications with the INSERT / UPDATE to a separate Logging database instead of the local database?I figure to change the Logging database recovery model to SIMPLE - I don't need to rollback the DELETE statements, and that get missed will be deleted tomorrow :)Upside: Dramatically reduce the size of Client DBs, and more importantly the size of their backups and TLogs.Adopt a much slimmer backup strategy for the Logging DB. We don't care too much if we lose logging data, and the cost of backups is becoming an issue, so this will reduce that (client DB will reduce by 50%, the TLog backups will probably reduce by more than that - each Logging row has an insert, an update and a delete and all are currently logged).What changes do I need to make? I figure instead ofINSERT INTO dbo.MyLoggingTable(StartDateTime, ProcessName)SELECT GetDate(), 'FooBar'I will change that toINSERT INTO MyLoggingDatabse.dbo.MyLoggingTable(Source, StartDateTime, ProcessName)SELECT DB_name(), GetDate(), 'FooBar'Ditto for the updateand I will create a VIEW in the client database with the original table's name [referencing the table in the logging database (WHERE source = 'MyDBName')] so any existing queries continue to run OK(This is on SQL2008, are there are any new features I could take advantage of?)Thanks for your input |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-01-12 : 04:00:57
|
You could make partitioned table for the new logging database (if you are using enterprise edition).That would speed up the select from your view. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-12 : 04:03:54
|
| Nah, we're cheapskates using Web Version, but its a good thought thanks. If we were using Enterprise I could Compress the backups and that would probably make my problem go away, short term, too!)I think our READ actions are rare enough I'm not too worried about them - Famous Last Words though. |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2010-01-12 : 07:17:20
|
Is it an option to have a separate logging table for each client? It will double the databases to maintain but I'd think that having several smaller databases would be better than one huge. You could also create a custom partitioning scheme on standard/web version that can help you increase performance on selects and deletes. It's a bit more tedious than the enterprise partitioning but if you i.e create a table for each day of logging (log_monday, log_tuesday, etc) you can also truncate the table with the oldest data instead of deleting. It might not be the optimal solution but I think it's worth considering.Here's a simple illustration...if you look at the execution plan for the last select you will see that it only queries one of the tables in the view:CREATE TABLE part_table1 ( ID INT NOT NULL PRIMARY KEY CLUSTERED, SomeValue VARCHAR(20) )CREATE TABLE part_table2 ( ID INT NOT NULL PRIMARY KEY CLUSTERED, SomeValue VARCHAR(20) )CREATE TABLE part_table3 ( ID INT NOT NULL PRIMARY KEY CLUSTERED, SomeValue VARCHAR(20) ) ALTER TABLE part_table1 WITH CHECK ADD CONSTRAINT CK_part_table1 CHECK (ID >= 0 AND ID <= 10)ALTER TABLE part_table1 CHECK CONSTRAINT CK_part_table1ALTER TABLE part_table2 WITH CHECK ADD CONSTRAINT CK_part_table2 CHECK (ID >= 11 AND ID <= 100)ALTER TABLE part_table2 CHECK CONSTRAINT CK_part_table2ALTER TABLE part_table3 WITH CHECK ADD CONSTRAINT CK_part_table3 CHECK (ID >= 101)ALTER TABLE part_table3 CHECK CONSTRAINT CK_part_table3 CREATE VIEW part_tableASSELECT * FROM part_table1UNION ALLSELECT * FROM part_table2UNION ALLSELECT * FROM part_table3INSERT INTO part_table SELECT 9, 'sqlteam'INSERT INTO part_table SELECT 24, 'sqlteam'INSERT INTO part_table2 SELECT 25, 'sqlteam'SELECT * FROM part_table WHERE ID = 22 - Lumbagohttp://xkcd.com/327/ |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2010-01-12 : 07:19:07
|
| or use a different product for backups that compress the data on the fly....www.red-gate.com? |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-12 : 08:08:22
|
@Lumbago: If I have separate table-names per client, or same table-name but separate logging-databases per client, I have to have custom Sprocs for Logging containing the target DB/table name (or some dynamic SQL, which will not be optimal performance-wise). Sooner or later some twit will run the wrong SProc on the wrong database :( So maintenance-wise I'd prefer to avoid that.I'm not worried about partitioning for deletion as we will just delete "Every more than 7 days old", so no need to delete per-client.However, truncating a table, instead of deleting, has huge appeal :) thanks for that thought.Could I have a table "Today" that all new log records are created in, and then tables called Day-2, Day-3, ...Then for housekeeping DROP Day-7, Rename Day-6 to Day-7 ... Rename Today to Day-2, and create a new "Today" table?Could I do that so that attempts to insert into Today were blocked whilst I was renaming / recreating that table?Alternatively I could have tables Day-1 to Day-7 and truncate Oldest and then update all databases with a "Now store in Day-n" flag so the Logging Sprocs switched to using that table from that point onwards.I'd have to then engineer something to re-set the CHECK CONSTRAINT for each table, so that the VIEW would be partitioned correctly.We have a Session ID in the log. That could, obviously, be split across multiple partitions. A common query is "WHERE SessionID=1234" (currently an indexed column, of course). Would that become a 7-table-trillion-row-scan? @AndrewMurphy: Thanks. I'd half thought about backup-compression. I'm not sure that's the whole answer though; the LDF files are currently getting extended because of the effort for Bulk Delete and Reindex (even though we try to mitigate those with "intelligent" deletion methods and increased frequently of TLog backup during deletions etc).I can't really describe the nuance of by gut feelings on this adequately, but perhaps its best described by the Logging data being nothing like the data in the rest of the database, and I feel they are better handled by two completely different strategies. |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2010-01-12 : 08:42:28
|
In regards to custom sproc's for logging I'd think that a custom view with the same name as the current logging table would suffice. An example for the customer LumbagoLtd:create view dbo.loggingasselect * from lumbagoltd_log.dbo.logging_1union allselect * from lumbagoltd_log.dbo.logging_2...select * from lumbagoltd_log.dbo.logging_7 By doing this there is no need to change any sprocs or or whatever is creating these logs. Then in the lumbagoltd_log.dbo.logging_1-7 you can have an indexed persisted computed column called MyDayOfWeekColumn or whatever with the dayofweek from the timestamp of the logging entry and a constraint on each table.Renaming the tables might work but as you're saying you'll get blocking issues when doing inserts and you will also need to drop all connections to the table that is being renamed before the renaming happens. Sounds like more of a pain than to figure out which table to delete from. Selecting data is easy, just create a view with the query: SELECT * FROM dbo.logging WHERE MyDayOfWeekColumn = DATEPART(dw, getdate())- Lumbagohttp://xkcd.com/327/ |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2010-01-12 : 09:15:45
|
| Alternatively re-examine the purpose of the logging?Stats? Do you need up to the minute stuff? Can you extrapolate "reasonable values" from "periodic samples"? Can you log in non-SQL terms...ie. have the web pages create mini-flat files (at least this method will avoid the SQL overhead of "logging" of the activity actions), albeit may be harder to investigate/process the files...but there's no subsititute for brainpower when faced with a problem and i'm sure you could come up with something.Can you divert the "web activity" actions to a summariser task, and only log summary records per period (hour, qtr-hour, etc?)Audit trail? Who/why/when does any lookback happen - can it suggest/pay for the solution. Given your deletion policy this seems an unlikely objective.Problem investigation? Turn on/off as needed....at least this will keep a further lid on the problem - albeit at a sacrifice of needing some problems to be re-created to allow controlled investigation? |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2010-01-12 : 10:15:33
|
Well...I was sort of bored at work and also a little bit intrigued by the problem so I wrote some code with what I see as a possible solution. The code is totally undocumented but hopefully it's somewhat self explanatory:EDIT: Unfortunately I was unable to get the LogWkDay as a persisted computed column since SQL Server regards DATEPART(dw, LogTime) as non-deterministic. Will have to find some sort of work-around if this is a problem.CREATE DATABASE lumbagoltdGOCREATE DATABASE lumbagoltd_logGOUSE lumbagoltd_logGOCREATE TABLE logging_day1 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day1 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day2 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day2 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day3 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day3 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day4 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day4 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day5 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day5 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day6 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day6 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) CREATE TABLE logging_day7 ( ID INT NOT NULL, LogTime DATETIME NOT NULL, LogWkDay INT NOT NULL, LogValue VARCHAR(20), CONSTRAINT PK_logging_day7 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC)) ALTER TABLE logging_day1 WITH CHECK ADD CONSTRAINT CK_logging_day1 CHECK (LogWkDay = 1)ALTER TABLE logging_day1 CHECK CONSTRAINT CK_logging_day1ALTER TABLE logging_day2 WITH CHECK ADD CONSTRAINT CK_logging_day2 CHECK (LogWkDay = 2)ALTER TABLE logging_day2 CHECK CONSTRAINT CK_logging_day2ALTER TABLE logging_day3 WITH CHECK ADD CONSTRAINT CK_logging_day3 CHECK (LogWkDay = 3)ALTER TABLE logging_day3 CHECK CONSTRAINT CK_logging_day3ALTER TABLE logging_day4 WITH CHECK ADD CONSTRAINT CK_logging_day4 CHECK (LogWkDay = 4)ALTER TABLE logging_day4 CHECK CONSTRAINT CK_logging_day4ALTER TABLE logging_day5 WITH CHECK ADD CONSTRAINT CK_logging_day5 CHECK (LogWkDay = 5)ALTER TABLE logging_day5 CHECK CONSTRAINT CK_logging_day5ALTER TABLE logging_day6 WITH CHECK ADD CONSTRAINT CK_logging_day6 CHECK (LogWkDay = 6)ALTER TABLE logging_day6 CHECK CONSTRAINT CK_logging_day6ALTER TABLE logging_day7 WITH CHECK ADD CONSTRAINT CK_logging_day7 CHECK (LogWkDay = 7)ALTER TABLE logging_day7 CHECK CONSTRAINT CK_logging_day7GOCREATE PROCEDURE dbo.CleanLogTables ASBEGIN DECLARE @DayOfWeekToDelete int = DATEPART(dw, DATEADD(day, -6, GETDATE())) DECLARE @SQL NVARCHAR(2000) SET @SQL = 'TRUNCATE TABLE logging_day' + CAST(@DayOfWeekToDelete AS NVARCHAR(1)) EXEC sp_executesql @SQLENDGOUSE lumbagoltdGOCREATE VIEW loggingASSELECT * FROM lumbagoltd_log.dbo.logging_day1UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day2UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day3UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day4UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day5UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day6UNION ALLSELECT * FROM lumbagoltd_log.dbo.logging_day7GOCREATE VIEW logging_todayASSELECT * FROM logging WHERE LogWkDay = DATEPART(dw, GETDATE())GOINSERT INTO logging SELECT 1, GETDATE(), DATEPART(dw, getdate()), 'Something' UNION ALLSELECT 2, '2010-01-12 15:36:58', DATEPART(dw, '2010-01-12 15:36:58'), 'Something' UNION ALLSELECT 3, '2010-01-14 15:36:58', DATEPART(dw, '2010-01-14 15:36:58'), 'Something' UNION ALLSELECT 4, '2010-01-21 15:36:58', DATEPART(dw, '2010-01-21 15:36:58'), 'Something' UNION ALLSELECT 5, '2010-01-20 15:36:58', DATEPART(dw, '2010-01-20 15:36:58'), 'Something' UNION ALLSELECT 6, '2010-01-08 15:36:58', DATEPART(dw, '2010-01-08 15:36:58'), 'Something' UNION ALLSELECT 7, '2010-01-30 15:36:58', DATEPART(dw, '2010-01-30 15:36:58'), 'Something' UNION ALLSELECT 8, '2010-01-11 15:36:58', DATEPART(dw, '2010-01-11 15:36:58'), 'Something'SELECT * FROM loggingGOEXEC lumbagoltd_log.dbo.CleanLogTablesGOSELECT * FROM loggingGO--> CleanupUSE masterGODROP DATABASE lumbagoltdDROP DATABASE lumbagoltd_log - Lumbagohttp://xkcd.com/327/ |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-12 : 10:49:04
|
@Lumbago: Edit: My post crossed with yours, will look at yours now.Can I INSERT into that VIEW? as efficiently as INSERT INTO lumbagoltd_log.dbo.logging_1(Col1, ... ??If not I think I'd prefer to explicitly name the table in the Logging Sproc.I can see the benefit of the VIEW for Queries though @AndrewMurphy: Yup, it would be timely to reconsider the logging.Two purposes: Debugging and Stats.For debugging we log every user action, web page request, Sproc that are run, when/if the user's browser finishes rendering the page (by a "callback") and so on. Everything we can think of to log, we do. Most of this data originates in SQL Server (the page has to get Session State Data from the DB, as a side effect of that we log the page view; the stuff that Sprocs want to log would be harder (I think?) to log to a flat file, and I reckon it would be much harder to then make use of that data - my primary skill is SQL, when confronted with "Where/How often is this happening" I am agile with a SQL Query against Log tables that I am familiar with.Perceived problem with the Web is that users rarely report problems, and with high numbers of visitors any emerging problems should be resolved "before users alert us". So no real opportunity to "turn on when needed". If MS add an ESP module to turns on logging just-in-time that would be good.So we log all this stuff on the off chance that we need it. Probably about 1 session a day is investigated, on average. With a major problem we'd look at several. A mailing with an incorrect URL to a missing image can be fixed by copying the image where it is needed, or "faking" the URL. (Web logs would get us that too, but probably not "until tomorrow", by which time the mailing-campaign is stale ...We do report on the logs for things that have an Error code (could tackle that by using a Trigger to duplicate those into an UrgentLog table)Separately we use the data for Stats. Aggregating together numbers of hits / page, the quartile performance (elapsed time) per page and a few other things; any other web stats are done with Google Analytics or similar (don't want to overload my servers when clients can get the data for free elsewhere).So I'm loath to dispense with this logging data as it enables us to be very agile in providing good end-Customer service to our Clients. But 99% of the stuff is never used, and it a large (for us) volume of data.I'm open to ideas though |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-12 : 13:58:42
|
| Very useful link, thanks Peso |
 |
|
|
|
|
|
|
|