Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Optimize memory with varchar(max) datatype

Author  Topic 

jstikal
Starting Member

19 Posts

Posted - 2015-04-09 : 16:35:00
I am building a table change log that will track each attribute update and include the original and new values.

[BatchYearMonthKey] [int] NULL,
[BatchYearMonthDayKey] [int] NULL,
[AccountID] [varchar](200) NULL,
[Attribute] [varchar] (200) NULL,
[Old_ValueAtrDefault] [varchar] (200) NULL,
[New_ValueAtrDefault] [varchar] (200) NULL,
[Old_ValueAtrLong] [varchar] (max) NULL,
[New_ValueAtrLong] [varchar] (max) NULL


The challenge that the spectrum of varchar lengths across the table. I have one attribute that requires varchar(max) and all other attributes (about 40) are varchar (200).

I am trying to accomplish the following:

Account ID Status
1 Enabled

Now changed to

AccountID Status
1 Disabled

My log table will look like the following:

[BatchYearMonthKey] BatchYearMonthDayKey] [AccountID] [Attribute] [Old_ValueAtrDefault] [New_ValueAtrDefault] [Old_ValueAtrLong] [New_ValueAtrLong]
201504 20150409 1 Status Enabled Disabled NULL NULL

My question:

I created two fields (Old_ValueAtrLong and New_ValueAtrLong) dedicated for the one attribute that is a varchar (max). I was trying to avoid storing [Status] for example that's a varchar(200) in a field that is varchar(max). Is this the right approach? Or are there other recommendations in how to handle storing the data in the most efficient manner?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2015-04-09 : 18:39:30
The storage used for varchar(n) or varchar(max) is based upon the data in the column. So if it's 200, then it'll be 202 bytes since it's n+2. If it's 1GB, it'll be 1GB+2 bytes.

So just store it in the varchar(max) column.



Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2015-04-10 : 10:20:41
Personally, as much of a pain as it is, I would use both varchar(200[+]) and varchar(max). From what I've read, and the testing I've seen, varchar(max) values do have more overhead. However, you could reduce that overhead by telling SQL to leave the varchar(max) values on the same page whenever possible.
Go to Top of Page

MichaelJSQL
Constraint Violating Yak Guru

252 Posts

Posted - 2015-04-10 : 10:36:15
Also - you cannot index a varchar(max) . You can have it in an include of an index, but not index the column itself.
Go to Top of Page

jstikal
Starting Member

19 Posts

Posted - 2015-04-10 : 12:43:35
I'm not quite sure what the volume of updates will look like which could determine the management of the data.

Thank you everyone, I greatly appreciate the feedback!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-04-12 : 07:51:07
Do you need to store both the Old and New?

if you have a chronological record of changes the "old" is in the next-earlier record. That would halve your storage.

We also do not store current value - that's in the main table :) so we only store the "old" values when a row is updated (using the [deleted] pseudo-table in a Trigger)
Go to Top of Page

jstikal
Starting Member

19 Posts

Posted - 2015-04-13 : 10:20:59
Thank you Kristen for the feedback. I don't have to store both but having the changes in a single record as compared to a new record with only the changes would provide the same level of detail. The extraction would just have to change. I also looked in to the CDC feature which is similar to the implementation that you described but that includes all attributes and a scan of each column would be needed to identify what changed. I think for now I will include both old and new values in a record and see how the table performs in production.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-04-13 : 13:24:15
You could keep the existing reporting code if you create a VIEW (which gets "a record" and "Next earlier record" and in effect presents Old / New data columns, in the same way the table does now.

quote:
a scan of each column would be needed to identify what changed


First audit system I ever wrote was like that (not my choice!) It was absolutely dreadful. Impossible to report off, the procedures for creating the data, and for attempting to report on it!, were huge. And the data itself was absolutely massive ...

Much better to store the "before" row IMHO. There might be instances where the only thing which changes is the Status Column, or similar, where it is "cheaper" to store just the changes to that, but for general record auditing I have found storing the "before" record works very well.
Go to Top of Page
   

- Advertisement -