Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Effective Date Pattern

Author  Topic 

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2007-10-03 : 08:11:21
I'm wondering if anyone has a better Effective Date Pattern than what I've been doing.

Simply put, I'm working with a database that has an "Effective Date" (and sometimes an "Effective Sequence", as well) on just about every table.

The two scenarios I most frequently encounter are:
1.) I need to query on the "most recent":

select
...
from
Source.Table t
where
not exists (
select 1
from
Source.Table t1
where
t.PK = t1.PK and
((t.EffectiveDate < t1.EffectiveDate) or
(t.EffectiveDate = t1.EffectiveDate and
t.EffectiveSequence < t1.EffectiveSequence)))


2.) I need to join two tables based on a Foreign Key and an Effective Date (Sequence):

select
...
from
Source.Table t
left join Source.OtherTable ot
on t.FK = ot.PK and
ot.EffectiveDate <= t.EffectiveDate and
not exists (
select 1
from
Source.OtherTable ot1
where
ot.PK = ot1.PK and
ot1.EffectiveDate <= t.EffectiveDate and
((ot.EffectiveDate < ot1.EffectiveDate) or
(ot.EffectiveDate = ot1.EffectiveDate and
ot.EffectiveSequence < ot1.EffectiveSequence)))


Now, this works fine, but I'm wondering if there is a better way. Once I start joining together 15 or so Effective Dated Source tables sprinkled in with many 5+ column candidate keys and some pretty heavy business rules ... my queries a) perform poorly and b) become very difficult to debug.

I'll admit that I haven't spent a ton of time investigating a better pattern.

Let me also say, I'm well aware of alternatives to the "not exists(<correlated subquery>)" approach, such as the "left join where the right-side is null" and the "in(select max())". I've found that in my domain it is so often true the that "not exists(<correlated subquery>)" is the superior performer, that I hardly even bother to test alternatives anymore.

I first started down the path of creating a CTE based view for some of these Effective Dated Source tables using a row_number() partition over Effective Date. This works well for simple situations (1. above); however, I'm finding that when applying this CTE Pattern to 2. above, I'm getting some heavy-duty sort operations in my plan that are killing performance. Plus, I can't use the CTE Pattern for my Source.OtherTable since the desired row may not be the most recent. I've tabled the CTE approach for now pending further thought.

I was think yesterday about Partitioned Indexes, but I haven't tried anything with that.

I'm tempted to think that due to my problem domain and the massive queries I need to write (I'm talking about Data Mart transformations here), my existing pattern may be all I have to work with. I just wanted to throw this out there to see if anyone can spark some new ideas for me.

Thanks

Jay
to here knows when

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 08:15:22
Something like this?
-- Prepare sample data
DECLARE @Sample TABLE (PkCol INT, Section VARCHAR(5), EffectiveDate DATETIME)

INSERT @Sample
SELECT 1, 'Home', '20070910 15:00' UNION ALL
SELECT 2, 'Home', '20071003 23:19' UNION ALL
SELECT 7, 'Away', '20071003 15:00' UNION ALL
SELECT 22, 'Home', '20071002 08:44' UNION ALL
SELECT 55, 'Away', '20071003 15:01'

-- Show the expected output
SELECT PkCol,
Section,
EffectiveDate
FROM (
SELECT PkCol,
Section,
EffectiveDate,
ROW_NUMBER() OVER (PARTITION BY Section ORDER BY EffectiveDate DESC) AS RecID
FROM @Sample
) AS d
WHERE RecID = 1


PkCol Section EffectiveDate
----- ------- -----------------------
55 Away 2007-10-03 15:01:00.000
2 Home 2007-10-03 23:19:00.000

E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-03 : 08:20:00
I sort of have this two in multi-part-keys where I have a Record Card [in my application] which shows a single record, and has Next/Back options which do:

SELECT TOP 1 *
FROM MyTable
WHERE PKey1 < @MyPKey1
OR (PKey1 = @MyPKey1 AND PKey2 < @MyPKey2
OR (PKey1 = @MyPKey1 AND PKey2 = @MyPKey2 AND PKey3 < @MyPKey3)
)
ORDER BY PKey1, PKey2, PKey3

an if I just had a "sequence number" column, that was always in order by PKey1, PKey2, PKey3 I would be larfing.

We do maintain a Sequence Number column on some of our hierarchical tables, so that we have "walking order" to display the tree, but if anything changes we have to renumber all/part of the table.

(Actually, we do have some reasonably intelligent stuff for that which only renumbers up to the next-higher record such that there is enough space to accommodate the renumber ...), but it still generates huge amounts of TLog when we are rapidly inserting in our Hierarchy tables (and stupidly we do NOT have a "Do not renumber until all the following inserts are done" method)

Sorry, just thinking out loud ...

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-03 : 08:20:52
I was thinking that "ROW_NUMBER() OVER (PARTITION BY ..." might help.

Any ideas how performant that is Peso?

Kristen
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 08:22:48
It is very fast. And do not add many extra reads at all to the query.

I recently compared ROW_NUMBER() with DENSE_RANK() and found that ROW_NUMBER is superior in getting sequence numbers.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 08:27:07
Perhaps you should make some kind of truth table for EffectiveDate?
See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=89547#333723



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 08:33:01
Please post some proper and accurate sample data together with expected output.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-03 : 08:49:37
"It is very fast"

That sounds faster than inserting into @TableVar with IDENTITY column

Kristen
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 08:50:59
The drawback with that method is that you need to READ again from @TableVar.
Also TOP n records for each group must be self-joined to @TableVar.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2007-10-03 : 09:32:17
quote:
Originally posted by Peso

Please post some proper and accurate sample data together with expected output.



E 12°55'05.25"
N 56°04'39.16"




I can't easily post due to IP issues, sorry.

You've sparked some thoughts for me ... one of the pieces that I struggled with when I first looked at a row_number partion is that when I go to join to Source.OtherTable, I need to get the most recent Effective Date that is less than the Source.Table Effective Date. Obviously, you can't reference Source.Table.PK inside the Source.OtherTable derived query ... however, what I just realized is that the partitioning should be done over Source.Table and Source.OtherTable at the same time.

I'm trying to re-engineer one of my queries to give this a try ... if I get stuck, I'll concoct some sample data and expected result that will highlight where I'm stuck ...

Thanks for you help.




Jay
to here knows when
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 09:52:27
[code]DECLARE @Source TABLE (Pk INT, dt DATETIME)

INSERT @Source
SELECT 1, '20070910' UNION ALL
SELECT 1, '20071003' UNION ALL
SELECT 2, '20071002' UNION ALL
SELECT 2, '20070822'

DECLARE @Other TABLE (Pk INT, dt DATETIME)

INSERT @Other
SELECT 1, '20071004' UNION ALL
SELECT 1, '20070101' UNION ALL
SELECT 2, '20070901'

;WITH Yak (Pk, dt, RecID)
AS (
SELECT s.Pk,
s.dt,
ROW_NUMBER() OVER (PARTITION BY s.Pk ORDER BY s.dt DESC) AS RecID
FROM @Source AS s
INNER JOIN @Other AS o ON o.Pk = s.Pk
WHERE s.dt < o.dt
)

SELECT Pk, dt
FROM Yak
WHERE RecID = 1[/code]


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2007-10-03 : 10:51:27
Peso, in your above query, I'll also need to order the partition by o.dt desc ...
Should the partition then be on s.PK and o.PK?

Jay
to here knows when
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-03 : 10:55:18
No, not since they are used for the JOIN operation, which means they are the same.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-10-03 : 11:55:11
Some thoughts:

1) ROWNUMBER() is the way to go, it saves a self-join and is very efficient. I recommend creating simple "Latest" views for all of your enities using ROWNUMBER() and you can use that where you need it.

2) for point in time reporting as of an effective date, remember that you can use table-valued UDF's that might accept an "effective date" parameter. That, combined with the new CROSS APPLY and OUTER APPLY features in SQL 2005, this should allow you to do whatever you need fairly efficiently. So, in addition to your "Latest" views, you could create "As-Of" user-defined-functions for each of your tables as well.

In scenario #2, that means you can write something like this:

select A.*, B.*
from dbo.TableA_AsOf(@EffectiveDate) A
inner join dbo.TableB_AsOf(@EffectiveDate) B on A.ID= B.ID

test for performance, of course, but certainly if you can create standard views/UDF's like this it will make your SQL very clean and clear and easy to work with.

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2007-10-04 : 08:11:04
Thanks for you input, Jeff ... my main problem is when dbo.TableB_AsOf needs to consume a.EffectiveDate ... this puts performance in the toilet. I think I have a good handle on using the row_number partition across the joins of all my tables ordered by the various EffectiveDates ... thanks everyone for you input.

Jay
to here knows when
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2007-10-04 : 09:18:51
I've found the best thing for effective dates is a sequence id for the rest of the key. I waited a long time to add this column in, but I finally decided to add it about 6 months ago... and speed has been up . To me the key to adding the sequence id was getting a trigger in place so I didn't have to manage it.

Granted, I'm still working in 2000 without all of the fancy shmancy row_number options .

Corey

snSQL on previous signature "...Oh and by the way Seventhnight, your signature is so wrong! On so many levels, morally, gramatically and there is a typo!"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-05 : 09:16:18
"I've found the best thing for effective dates is a sequence id for the rest of the key"

Do you mean that you have a "walking order sequence" number column?

We have that on a Hierarchy table. But we store it in a separate, parallel, one:one table.

The main table is "audited". We don't want updates caused only by changes to sequence number to be audited.

Also, on bulk-inserts we turn OFF the automatic-renumber thingie, and then rebuild the whole Sequence table afterwards. Reduces the number of hits on the TLog significantly!

(e.g. if we move a record to a different part of the hierarchy and it has loads of child items that need to be moved too)

Kristen
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-10-05 : 09:40:41
quote:
Originally posted by Page47

Thanks for you input, Jeff ... my main problem is when dbo.TableB_AsOf needs to consume a.EffectiveDate ... this puts performance in the toilet. I think I have a good handle on using the row_number partition across the joins of all my tables ordered by the various EffectiveDates ... thanks everyone for you input.

Jay
to here knows when



You actually can have TableB consume the effective date of TableA via a UDF if you use cross apply:

select *
from dbo.TableA(@EffectiveDate) A
cross apply dbo.TableB(A.EffectiveDate, A.ID)

but I am not sure of the performance implications.



- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-05 : 16:26:44
There is not only CROSS APPLY.
You also have OUTER APPLY, which is pretty much like "LEFT JOIN WHERE IS NULL".



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page
   

- Advertisement -