Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Log to Current Database or to Logging Database?

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2010-01-12 : 03:52:13
Hi guys,

Our application has a logging table recording activity about user's web page views, and the database procs that are called. It gets several million INSERTs a day, and each INSERT is followed by an update (with Error / Success Code and End Time).

These logging rows are kept for about 7 days - so we also have a housekeeping job that deletes several million rows each day.

We have a dozen or so databases on a server - one per client. Each has the same logging action.

The logging data accounts for at least 50% of the client database storage.

The logging data is seldom read (we have a Stats report that aggregates it once a day, other than that we only look at it when investigating a problem). So essentially write-only :)

Here's my question:

What are the implications of storing this logging data in a separate database - one per server (I'll add a "SOURCE" column to indicate which database it came from).

Are there any performance implications with the INSERT / UPDATE to a separate Logging database instead of the local database?

I figure to change the Logging database recovery model to SIMPLE - I don't need to rollback the DELETE statements, and that get missed will be deleted tomorrow :)

Upside: Dramatically reduce the size of Client DBs, and more importantly the size of their backups and TLogs.

Adopt a much slimmer backup strategy for the Logging DB. We don't care too much if we lose logging data, and the cost of backups is becoming an issue, so this will reduce that (client DB will reduce by 50%, the TLog backups will probably reduce by more than that - each Logging row has an insert, an update and a delete and all are currently logged).

What changes do I need to make? I figure instead of

INSERT INTO dbo.MyLoggingTable(StartDateTime, ProcessName)
SELECT GetDate(), 'FooBar'

I will change that to

INSERT INTO MyLoggingDatabse.dbo.MyLoggingTable(Source, StartDateTime, ProcessName)
SELECT DB_name(), GetDate(), 'FooBar'

Ditto for the update

and I will create a VIEW in the client database with the original table's name [referencing the table in the logging database (WHERE source = 'MyDBName')] so any existing queries continue to run OK

(This is on SQL2008, are there are any new features I could take advantage of?)

Thanks for your input

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-01-12 : 04:00:57
You could make partitioned table for the new logging database (if you are using enterprise edition).
That would speed up the select from your view.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-12 : 04:03:54
Nah, we're cheapskates using Web Version, but its a good thought thanks. If we were using Enterprise I could Compress the backups and that would probably make my problem go away, short term, too!)

I think our READ actions are rare enough I'm not too worried about them - Famous Last Words though.
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-01-12 : 07:17:20
Is it an option to have a separate logging table for each client? It will double the databases to maintain but I'd think that having several smaller databases would be better than one huge. You could also create a custom partitioning scheme on standard/web version that can help you increase performance on selects and deletes. It's a bit more tedious than the enterprise partitioning but if you i.e create a table for each day of logging (log_monday, log_tuesday, etc) you can also truncate the table with the oldest data instead of deleting. It might not be the optimal solution but I think it's worth considering.

Here's a simple illustration...if you look at the execution plan for the last select you will see that it only queries one of the tables in the view:
CREATE TABLE part_table1 (
ID INT NOT NULL PRIMARY KEY CLUSTERED,
SomeValue VARCHAR(20)
)
CREATE TABLE part_table2 (
ID INT NOT NULL PRIMARY KEY CLUSTERED,
SomeValue VARCHAR(20)
)
CREATE TABLE part_table3 (
ID INT NOT NULL PRIMARY KEY CLUSTERED,
SomeValue VARCHAR(20)
)

ALTER TABLE part_table1 WITH CHECK ADD CONSTRAINT CK_part_table1 CHECK (ID >= 0 AND ID <= 10)
ALTER TABLE part_table1 CHECK CONSTRAINT CK_part_table1

ALTER TABLE part_table2 WITH CHECK ADD CONSTRAINT CK_part_table2 CHECK (ID >= 11 AND ID <= 100)
ALTER TABLE part_table2 CHECK CONSTRAINT CK_part_table2

ALTER TABLE part_table3 WITH CHECK ADD CONSTRAINT CK_part_table3 CHECK (ID >= 101)
ALTER TABLE part_table3 CHECK CONSTRAINT CK_part_table3

CREATE VIEW part_table
AS
SELECT * FROM part_table1
UNION ALL
SELECT * FROM part_table2
UNION ALL
SELECT * FROM part_table3

INSERT INTO part_table SELECT 9, 'sqlteam'
INSERT INTO part_table SELECT 24, 'sqlteam'
INSERT INTO part_table2 SELECT 25, 'sqlteam'

SELECT * FROM part_table WHERE ID = 22


- Lumbago
http://xkcd.com/327/
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2010-01-12 : 07:19:07
or use a different product for backups that compress the data on the fly....www.red-gate.com?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-12 : 08:08:22
@Lumbago: If I have separate table-names per client, or same table-name but separate logging-databases per client, I have to have custom Sprocs for Logging containing the target DB/table name (or some dynamic SQL, which will not be optimal performance-wise). Sooner or later some twit will run the wrong SProc on the wrong database :( So maintenance-wise I'd prefer to avoid that.

I'm not worried about partitioning for deletion as we will just delete "Every more than 7 days old", so no need to delete per-client.

However, truncating a table, instead of deleting, has huge appeal :) thanks for that thought.

Could I have a table "Today" that all new log records are created in, and then tables called Day-2, Day-3, ...

Then for housekeeping DROP Day-7, Rename Day-6 to Day-7 ... Rename Today to Day-2, and create a new "Today" table?

Could I do that so that attempts to insert into Today were blocked whilst I was renaming / recreating that table?

Alternatively I could have tables Day-1 to Day-7 and truncate Oldest and then update all databases with a "Now store in Day-n" flag so the Logging Sprocs switched to using that table from that point onwards.

I'd have to then engineer something to re-set the CHECK CONSTRAINT for each table, so that the VIEW would be partitioned correctly.

We have a Session ID in the log. That could, obviously, be split across multiple partitions. A common query is "WHERE SessionID=1234" (currently an indexed column, of course). Would that become a 7-table-trillion-row-scan?

@AndrewMurphy: Thanks. I'd half thought about backup-compression. I'm not sure that's the whole answer though; the LDF files are currently getting extended because of the effort for Bulk Delete and Reindex (even though we try to mitigate those with "intelligent" deletion methods and increased frequently of TLog backup during deletions etc).

I can't really describe the nuance of by gut feelings on this adequately, but perhaps its best described by the Logging data being nothing like the data in the rest of the database, and I feel they are better handled by two completely different strategies.
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-01-12 : 08:42:28
In regards to custom sproc's for logging I'd think that a custom view with the same name as the current logging table would suffice. An example for the customer LumbagoLtd:
create view dbo.logging
as
select * from lumbagoltd_log.dbo.logging_1
union all
select * from lumbagoltd_log.dbo.logging_2
...
select * from lumbagoltd_log.dbo.logging_7
By doing this there is no need to change any sprocs or or whatever is creating these logs. Then in the lumbagoltd_log.dbo.logging_1-7 you can have an indexed persisted computed column called MyDayOfWeekColumn or whatever with the dayofweek from the timestamp of the logging entry and a constraint on each table.

Renaming the tables might work but as you're saying you'll get blocking issues when doing inserts and you will also need to drop all connections to the table that is being renamed before the renaming happens. Sounds like more of a pain than to figure out which table to delete from. Selecting data is easy, just create a view with the query: SELECT * FROM dbo.logging WHERE MyDayOfWeekColumn = DATEPART(dw, getdate())

- Lumbago
http://xkcd.com/327/
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2010-01-12 : 09:15:45
Alternatively re-examine the purpose of the logging?

Stats? Do you need up to the minute stuff? Can you extrapolate "reasonable values" from "periodic samples"? Can you log in non-SQL terms...ie. have the web pages create mini-flat files (at least this method will avoid the SQL overhead of "logging" of the activity actions), albeit may be harder to investigate/process the files...but there's no subsititute for brainpower when faced with a problem and i'm sure you could come up with something.

Can you divert the "web activity" actions to a summariser task, and only log summary records per period (hour, qtr-hour, etc?)

Audit trail? Who/why/when does any lookback happen - can it suggest/pay for the solution. Given your deletion policy this seems an unlikely objective.

Problem investigation? Turn on/off as needed....at least this will keep a further lid on the problem - albeit at a sacrifice of needing some problems to be re-created to allow controlled investigation?
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-01-12 : 10:15:33
Well...I was sort of bored at work and also a little bit intrigued by the problem so I wrote some code with what I see as a possible solution. The code is totally undocumented but hopefully it's somewhat self explanatory:

EDIT: Unfortunately I was unable to get the LogWkDay as a persisted computed column since SQL Server regards DATEPART(dw, LogTime) as non-deterministic. Will have to find some sort of work-around if this is a problem.
CREATE DATABASE lumbagoltd
GO

CREATE DATABASE lumbagoltd_log
GO

USE lumbagoltd_log
GO

CREATE TABLE logging_day1 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day1 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day2 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day2 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day3 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day3 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day4 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day4 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day5 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day5 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day6 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day6 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

CREATE TABLE logging_day7 (
ID INT NOT NULL,
LogTime DATETIME NOT NULL,
LogWkDay INT NOT NULL,
LogValue VARCHAR(20),
CONSTRAINT PK_logging_day7 PRIMARY KEY CLUSTERED (ID ASC, LogWkDay ASC))

ALTER TABLE logging_day1 WITH CHECK ADD CONSTRAINT CK_logging_day1 CHECK (LogWkDay = 1)
ALTER TABLE logging_day1 CHECK CONSTRAINT CK_logging_day1

ALTER TABLE logging_day2 WITH CHECK ADD CONSTRAINT CK_logging_day2 CHECK (LogWkDay = 2)
ALTER TABLE logging_day2 CHECK CONSTRAINT CK_logging_day2

ALTER TABLE logging_day3 WITH CHECK ADD CONSTRAINT CK_logging_day3 CHECK (LogWkDay = 3)
ALTER TABLE logging_day3 CHECK CONSTRAINT CK_logging_day3

ALTER TABLE logging_day4 WITH CHECK ADD CONSTRAINT CK_logging_day4 CHECK (LogWkDay = 4)
ALTER TABLE logging_day4 CHECK CONSTRAINT CK_logging_day4

ALTER TABLE logging_day5 WITH CHECK ADD CONSTRAINT CK_logging_day5 CHECK (LogWkDay = 5)
ALTER TABLE logging_day5 CHECK CONSTRAINT CK_logging_day5

ALTER TABLE logging_day6 WITH CHECK ADD CONSTRAINT CK_logging_day6 CHECK (LogWkDay = 6)
ALTER TABLE logging_day6 CHECK CONSTRAINT CK_logging_day6

ALTER TABLE logging_day7 WITH CHECK ADD CONSTRAINT CK_logging_day7 CHECK (LogWkDay = 7)
ALTER TABLE logging_day7 CHECK CONSTRAINT CK_logging_day7

GO

CREATE PROCEDURE dbo.CleanLogTables
AS
BEGIN
DECLARE @DayOfWeekToDelete int = DATEPART(dw, DATEADD(day, -6, GETDATE()))
DECLARE @SQL NVARCHAR(2000)

SET @SQL = 'TRUNCATE TABLE logging_day' + CAST(@DayOfWeekToDelete AS NVARCHAR(1))

EXEC sp_executesql @SQL
END

GO

USE lumbagoltd
GO

CREATE VIEW logging
AS
SELECT * FROM lumbagoltd_log.dbo.logging_day1
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day2
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day3
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day4
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day5
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day6
UNION ALL
SELECT * FROM lumbagoltd_log.dbo.logging_day7

GO

CREATE VIEW logging_today
AS
SELECT * FROM logging WHERE LogWkDay = DATEPART(dw, GETDATE())

GO

INSERT INTO logging
SELECT 1, GETDATE(), DATEPART(dw, getdate()), 'Something' UNION ALL
SELECT 2, '2010-01-12 15:36:58', DATEPART(dw, '2010-01-12 15:36:58'), 'Something' UNION ALL
SELECT 3, '2010-01-14 15:36:58', DATEPART(dw, '2010-01-14 15:36:58'), 'Something' UNION ALL
SELECT 4, '2010-01-21 15:36:58', DATEPART(dw, '2010-01-21 15:36:58'), 'Something' UNION ALL
SELECT 5, '2010-01-20 15:36:58', DATEPART(dw, '2010-01-20 15:36:58'), 'Something' UNION ALL
SELECT 6, '2010-01-08 15:36:58', DATEPART(dw, '2010-01-08 15:36:58'), 'Something' UNION ALL
SELECT 7, '2010-01-30 15:36:58', DATEPART(dw, '2010-01-30 15:36:58'), 'Something' UNION ALL
SELECT 8, '2010-01-11 15:36:58', DATEPART(dw, '2010-01-11 15:36:58'), 'Something'

SELECT * FROM logging
GO

EXEC lumbagoltd_log.dbo.CleanLogTables
GO

SELECT * FROM logging
GO

--> Cleanup
USE master
GO
DROP DATABASE lumbagoltd
DROP DATABASE lumbagoltd_log


- Lumbago
http://xkcd.com/327/
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-12 : 10:49:04
@Lumbago:

Edit: My post crossed with yours, will look at yours now.

Can I INSERT into that VIEW? as efficiently as INSERT INTO lumbagoltd_log.dbo.logging_1(Col1, ... ??

If not I think I'd prefer to explicitly name the table in the Logging Sproc.

I can see the benefit of the VIEW for Queries though

@AndrewMurphy: Yup, it would be timely to reconsider the logging.

Two purposes: Debugging and Stats.

For debugging we log every user action, web page request, Sproc that are run, when/if the user's browser finishes rendering the page (by a "callback") and so on. Everything we can think of to log, we do. Most of this data originates in SQL Server (the page has to get Session State Data from the DB, as a side effect of that we log the page view; the stuff that Sprocs want to log would be harder (I think?) to log to a flat file, and I reckon it would be much harder to then make use of that data - my primary skill is SQL, when confronted with "Where/How often is this happening" I am agile with a SQL Query against Log tables that I am familiar with.

Perceived problem with the Web is that users rarely report problems, and with high numbers of visitors any emerging problems should be resolved "before users alert us". So no real opportunity to "turn on when needed". If MS add an ESP module to turns on logging just-in-time that would be good.

So we log all this stuff on the off chance that we need it. Probably about 1 session a day is investigated, on average. With a major problem we'd look at several. A mailing with an incorrect URL to a missing image can be fixed by copying the image where it is needed, or "faking" the URL. (Web logs would get us that too, but probably not "until tomorrow", by which time the mailing-campaign is stale ...

We do report on the logs for things that have an Error code (could tackle that by using a Trigger to duplicate those into an UrgentLog table)

Separately we use the data for Stats. Aggregating together numbers of hits / page, the quartile performance (elapsed time) per page and a few other things; any other web stats are done with Google Analytics or similar (don't want to overload my servers when clients can get the data for free elsewhere).

So I'm loath to dispense with this logging data as it enables us to be very agile in providing good end-Customer service to our Clients. But 99% of the stuff is never used, and it a large (for us) volume of data.

I'm open to ideas though
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-01-12 : 13:25:59
Soft partitioning is explained here too
http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Horizontal-partitioning.aspx

Yes, you can insert into a partitioned view.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-12 : 13:58:42
Very useful link, thanks Peso
Go to Top of Page
   

- Advertisement -