Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 SUM aggregate and irregular dates

Author  Topic 

KidSQL
Yak Posting Veteran

88 Posts

Posted - 2007-02-13 : 14:08:13
Hello,

I have a table organized as follows:

Date - ID - Value

I am trying to return a result set from this table as follows

Date - ID - Value - sum(Value)

using a group by clause and summing the values (Value) per ID over several different, but successive, dates. I would like to vary the calculation as I choose (so say the last 10 Date observations per ID up to a given date, for example). The trouble is, the 'Date' column contains dates at irregularly spaced intervals so I don't want to have to rely on a self-join query specifying date intervals because I won't really know (not with 100% accuracy, anyway) what these date differences will be (otherwise I would do a self join and use dateadd or datediff or something like that to specify my interval). Rather, I want to set up a query somehow referencing something like the last 'n' periods and vary that 'n' as I choose in order to vary the summing date intervals.

My current solution is to select a subset from this table using a unique row identifier and ordering by ID and Date into a temporary table in which I perform a self-join as follows:

select "rowid" = identity(int,1,1), * into #mytemp from mytable

I then conduct my query as follows:

select "Date1" = a.Date, "Date2" = b.Date, b.ID, "NperiodSum" = sum(b.val)
from #mytemp a,
#mytemp b
where a.ID = b.ID
and b.rowid - a.rowid = 2
group by a.Date, b.Date, b.ID, b.val, b.rowid - a.rowid
go

This works fine as all I have to do now is to indicate a rowid difference to vary my 'n'. However, it is time consuming, particularly if the recordsets are large. I was wondering if there wasn't a more efficient way of doing this, perhaps using a subquery and referencing the original table, instead of having to create an identity-column populated temp table each time, which I suspect might be consuming far more time for my trouble than is worth.

Here's a setup, if anyone's interested (ignore the fact that the dates are at regularly spaced intervals below, normally they won't be):

create table #test
(
Date smalldatetime null,
ID char(3) null,
Value float null
)
go

insert into #test
(Date,ID,Value)
Values
('2003-12-31','ABC',4.5)
insert into #test
(Date,ID,Value)
Values
('2003-12-31','DEF',2.5)
insert into #test
(Date,ID,Value)
Values
('2004-12-31','ABC',6.6)
insert into #test
(Date,ID,Value)
Values
('2004-12-31','DEF',7.2)
insert into #test
(Date,ID,Value)
Values
('2005-12-31','ABC',8.4)
insert into #test
(Date,ID,Value)
Values
('2005-12-31','DEF',1.1)
insert into #test
(Date,ID,Value)
Values
('2006-12-31','ABC',11.2)
insert into #test
(Date,ID,Value)
Values
('2006-12-31','DEF',1.4)
go

select * from #test
go

drop table #test
go


Any suggestions would be gratefully received.

-KS

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-13 : 14:20:34
[code]declare @records int

-- initialize the @records variable of how many records to select from newest!
select @records = 2

-- stage the data
create table #stage
(
rowid int identity(1, 1),
Date smalldatetime null,
ID char(3) null,
Value float null
)

insert #stage
(
Date,
ID,
Value
)
select date,
id,
value
from (
select '2003-12-31' as date, 'ABC' as id, 4.5 as value union all
select '2003-12-31', 'DEF', 2.5 union all
select '2004-12-31', 'ABC', 6.6 union all
select '2004-12-31', 'DEF', 7.2 union all
select '2005-12-31', 'ABC', 8.4 union all
select '2005-12-31', 'DEF', 1.1 union all
select '2006-12-31', 'ABC', 11.2 union all
select '2006-12-31', 'DEF', 1.4
) as originaltable
order by id,
date,
value

select * from #stage order by id, date

-- make intermediate table
declare @med table (id char(3), fromrowid int, torowid int)

insert @med
select id,
case when max(rowid) - @records + 1 >= min(rowid) then max(rowid) - @records + 1 else min(rowid) end,
max(rowid)
from #stage
group by id

select * from @med

select s.*
from #stage as s
inner join @med as m on m.id = s.id and m.fromrowid <= s.rowid and m.torowid >= s.rowid

drop table #stage[/code]

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

KidSQL
Yak Posting Veteran

88 Posts

Posted - 2007-02-13 : 17:46:19
Hi Peter,

Thank you very much for that insightful suggestion. It seems to work nicely - I will examine it closely.

-KS
Go to Top of Page

KidSQL
Yak Posting Veteran

88 Posts

Posted - 2007-02-14 : 09:12:24
Hi,

Thanks for this suggestion. However, I'm not sure how much of an advantage it offers over my original way of doing things. It is definitely an alternative, of course - just not exactly as I imagined.

Is it possible, for instance, to somehow use a sub-query in which I indicate the count(*) of rows for a given ID whose date is <= a target date and somehow have this specify my n-period sum? I am trying to think of just a simply query without having to create other tables or use unique row identifiers, etc. Maybe something like:

select "Date1" = a.Date, "Date2" = b.Date, b.ID, "NperiodSum" = sum(b.val)
from #test a,
#test b,
#test c
where a.ID = b.ID
and a.date >= '2003-12-31'
and b.date in
(--something like: select max(date) from #test c where ID = 'ABC'
and c.date > a.date and count(*) of 'ABC' = 3)

Is something like this possible? Unfortunately, as you can see, my SQL skills are not developed enough to come up with something like that without some assistance.

Thank you for your posted (and any future) suggestions.

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-14 : 09:23:45
A subquery is executed for every row in the query. That what is taking long time.
My suggestions works fast both for small sets of data, and huge sets of data.

But it is ultimately your call.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

KidSQL
Yak Posting Veteran

88 Posts

Posted - 2007-02-14 : 09:47:03
I see - in that case, I'm fine using either of the two methods above.

Thank you for your help - I appreciate it.

-KS
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-14 : 09:48:56
And you can always take a look at the execution/query plan too see which query is the most efficient in terms of system resources.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page
   

- Advertisement -