Compilation of incrementally entered data

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Compilation of incrementally entered data

Author

Topic

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-21 : 08:05:05

Hi folks,

This is probably simple but I've never developped a query of this type and I'm clueless. Take the following data which only contains changes with respect to prior status:

Address          City        State   Zip    Telephone    RecordDate
123 Main st.     New York    NY      12345  2127889999   March 5, 2011
Null             Null        Null    14522  Null         March 6, 2011
Null             Null        Null    Null   2127887777   March 7, 2011

The result set of the query for a current status should be:

123 Main st.     New York    NY      14522  2127887777

I'm hoping there's a keyword that I don't know about which will do just that. Thanks for your help!

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2011-03-21 : 08:13:48

strange that the record only has the changes, and not the value for all of the fields...

http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx
How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx
For ultra basic questions, follow these links.
http://www.sql-tutorial.net/
http://www.firstsql.com/tutor.htm
http://www.w3schools.com/sql/default.asp

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-21 : 08:15:52

Does the table at least have some kind of key that says that these records are related? Also, what is the data type of RecordDate?

Jim

Everyday I learn something that somebody else already knew

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-21 : 08:30:57

I only posted a simple example of the data which would relate to a particular account. There is an ID key and an Account field. The record date is actually an Integer. The reason why the information is entered incrementally is to avoid the DB taking up a lot of space. My understanding is that a Null only takes 2 bytes in a VarChar field as opposed to repeating the unchanged information which would take up much more space.

A more complete example would therefore be like

ID  Account Address          City        State   Zip    Telephone    RecordDate
1   1       123 Main st.     New York    NY      12345  2127889999   40605
2   1       Null             Null        Null    14522  Null         40606
3   1       Null             Null        Null    Null   2127887777   40607

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-23 : 13:17:11

Hi again folks,

So I guess there's no way to build the final result using a SQL query which rejects the NULL values. I guess I'll have to create a routine in Visual Studio to do it then. Thanks for looking!

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-23 : 13:23:06

It may be impossible in SQL, as there are so many combinations to consider. This query would be really easy if you changed your table. You're not really saving any space doing it this way, especially if the old values aren't needed.

Jim

Everyday I learn something that somebody else already knew

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-23 : 13:46:25

Jim,

OK, now we're getting into table design. Do you have any suggestions as to how best to restructure this potentially?

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-23 : 14:54:00

Your design isn't bad, just populate all the fields when there is a change to any one of them. But if these updates are just corrections, i.e, the zip should have been 14522 all along, just update the table with correct info. You might want to split off phone number into its own table as many people have more than 1 phone number.

Jim

Everyday I learn something that somebody else already knew

TimS
Posting Yak Master

198 Posts

Posted - 2011-03-23 : 16:06:23

SQL COALESCE example
http://www.sqlbook.com/SQL/SQL-Coalesce-28.aspx

This keyword should help solve this problem. Note: it is not enough to solve the problem by itself.

On second thought, I am not sure it will work.

This might work below for a single column; I have not done SQL in over 5 years.

SELECT TOP 1 t.column
FROM table t
where t.column is not null
order by t.date desc

Tim S.

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-23 : 19:45:35

Thanks TimS, I had already looked into coalescing but I can't figure out how I might filter things properly.

And thanks to jimf but I don't understand your premise. The storage size for varchar fields is the actual length of data entered + 2 bytes. So by definition, if all fields are constantly documented, it will take up a lot more space than just documenting alterations (unless SQL Server had AI and could do pattern recog to save the space, but my understanding is that's not the case).

If there's a more efficient storage methodology which is best practice, I'd appreciate any input. Thanks everyone!

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-24 : 07:26:28

You're really not saving that much space, though you may be saving some. However, that old data doesn't seem necessary and that's a waste of space and now you have a table that you can't really use, which is also a waste of space. Also, there is overhead associated with null value as sql still has to make room in case that column does get a value. I'm trying to find the article I read about that. If you're at the beginning of the design phase you have a chance to make it right the first time -- read up on Normalization and that's a start.

Jim

Everyday I learn something that somebody else already knew

michael.appleton
Posting Yak Master

160 Posts

Posted - 2011-03-24 : 07:28:43

I know where you're coming from as I've looked at storing data the same way you have before. The thing is storing temporal data in a relation database is a lot easier if you stop thinking about only optimising on space. Space is cheap and as you've found out the method you've picked is actually a lot harder to work with. Optimisation doesn't mean minimising everything, because that's actually impossible. By minimising space, you've actually harmed performance. Optimising is actually minimising the combination of all the different factors that constrain a database.

I'd also add in another column called NextRecordDate which stores the date of the next record in that account like this:



CREATE TABLE Accounts(
	Account int,
	Address varchar(1000),    
	City varchar(255),
	State char(2),
	Zip char(5),
	Telephone char(10),
	RecordDate datetime,
	NextRecordDate datetime,
	Primary Key(Account,RecordDate),
	UNIQUE(Account,NextRecordDate)
)

With the data stored like this:



Account Address          City        State	Zip    Telephone    RecordDate	NextRecordDate
1       123 Main st.     New York    NY		12345  2127889999   20110305	20110306
1       123 Main st.     New York    NY     14522   2127889999   20110306	20110307
1       123 Main st.     New York    NY     14522   2127887777   20110307	99991231

Where there is no next record I store a date far in the future. This could be argued as a waste of space as the data is stored in two place in the database, but it makes things so much easier. To get a version of the data at any point in time the query is simply:


DECLARE @Date datetime
SET @Date='20110306'  --or whatever date you like

SELECT * FROM Accounts WHERE RecordDate<=@Date AND NextRecordDate>@Date

Then you'll get just one row per account.

Hope this all helps.

michael.appleton
Posting Yak Master

160 Posts

Posted - 2011-03-24 : 07:32:31

Check out my thinking a few years ago:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=102811

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-24 : 07:49:06

If I had a choice I would design it slightly differently.

I would keep the latest data in the main table, and the historical/audit data in a separate audit table, which has the same columns along with two or three additional columns to indicate when the change happened, who changed it etc.

The advantage of doing it this way are:
a) easy to query the latest data. You wouldn't need to coalesce etc. to get it.
b) the audit table can be populated via a trigger - so less room for client code to incorrectly enter updates.
c) you can keep track of even deletions from the table etc.

[Getting off from her soap box now :--)]

Having said all that, here is a way in which you can coalesce your existing table to get the data you want.[code][/code

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-24 : 08:04:44

My previous post somehow got messed up. This is the code that is missing from that post.

create table #tmp 
(  ID int,  
   Account int, 
   [Address] varchar(255),          
   City  varchar(255),      
   [State] varchar(255),
   Zip  varchar(255),  
   Telephone varchar(255),   
   RecordDate datetime
);

insert into #tmp values (1,1,'123 Main st.','New York','NY','12345','2127889999','20110101');
insert into #tmp values (2,1,Null,Null,Null,'14522',Null,'20110203');
insert into #tmp values (3,1,Null,Null,Null,Null,'2127887777','20110207');
insert into #tmp values (1,2,'123 Washington st.','Stamford','CT','06901','2035551212','20110401');
insert into #tmp values (1,3,'123 Lincoln st.','Elm Grove','WI','53122','2625551212','20100101');
insert into #tmp values (2,3,'128 Lincoln st.',null,'WI','53122','2625551213','20100101');

with cte1 as
(
   select Id,Account,[Address],City,[State],Zip,Telephone,RecordDate from #tmp where Id = 1
   union all
   select
      a.Id,
      a.Account,
      coalesce(a.[Address],b.[Address]),
      coalesce(a.City,b.City),
      coalesce(a.[State],b.[State]),
      coalesce(a.[Zip],b.[Zip]),
      coalesce(a.Telephone,b.Telephone),
      coalesce(a.RecordDate,b.RecordDate)
   from
      #tmp a inner join cte1 b on a.id = b.id+1 and a.account= b.account
),
cte2 as
(
   select *,row_number() over (partition by [Account] order by Id desc)rn 
   from cte1
)
select * from cte2 where rn = 1
   
drop table #tmp;

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-24 : 08:14:04

Jimf, the null takes 2 bytes of overhead, which is of course necessary and acceptable. But repeating the unchanged data from line to line not only takes up more space but also involves having to do a comparison between the lines in order to determine what the differences are. This DB design is interesting because of those reasons... if it can be managed easily of course...

Michael, thanks for that advice, and I'll read up on the extra topic that you provided.

Actually, I do have both a "latest status" table which provides the current address set along with the table that describes incremental change. The issue is when the user wants to see who/what changed on a prior date and that's when data compilation is required of course. This is to meet regulatory requirements which is forcing the issue onto the design.

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-24 : 08:19:16

sunitabeck, I'll go run the query and let you know. Thanks!

michael.appleton
Posting Yak Master

160 Posts

Posted - 2011-03-24 : 10:52:52

Okay seen as everyone's talking temporal data, can anyone point me to something that talks about the best way to set up indexing and primary/foreign keys? I've got several temporal data storing tables that are related to each other and figuring out how to manage keys and indexes seems quite difficult. I hope this isn't hijacking your thread but it seems relevant!

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2011-03-24 : 11:12:26

why are you keeping incremental changes in the same table? I would suggest an Accounts_audit table that gets populated with the incremental changes. Accounts stays as is with the current values or last values.
so what you would have is


123 Main st.     New York    NY      14522  2127887777

If you don't have the passion to help people, you have no passion

WindChaser
Posting Yak Master

225 Posts

Posted - 2011-03-24 : 12:08:39

sunitabeck, I'm reporting back that your query works like an absolute charm. It's a thing of beauty! Thank!!!

Michael, I'm not a moderator but I suppose that it would be best to start a new thread in order to enable easier future referencing. I myself love to search for answers and usually find what I need without posting... as long as the threads don't become too confusing.

Yosiasz, as I said earlier, I do have a Status table and an "Audits" table. But when comes the time to compile the incremental data to produce a report for the audit, this is what this query is for.

Sqlraider
Yak Posting Veteran

65 Posts

Posted - 2011-03-24 : 12:32:04

FYI,
another way would be to use MERGE with OUTPUT. You can put the OUTPUT into an audit table, in it you can put what Action (insert/update), the date the Action took place plus what the record looked like BEFORE & AFTER the changes where made.

You're orginal table would have what the account currently looks like & you could query the OUTPUT audit table to show any & all changes made to an account.

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources