Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
KidSQL
Yak Posting Veteran
88 Posts |
Posted - 2007-02-13 : 14:08:13
|
| Hello,I have a table organized as follows:Date - ID - ValueI am trying to return a result set from this table as followsDate - ID - Value - sum(Value)using a group by clause and summing the values (Value) per ID over several different, but successive, dates. I would like to vary the calculation as I choose (so say the last 10 Date observations per ID up to a given date, for example). The trouble is, the 'Date' column contains dates at irregularly spaced intervals so I don't want to have to rely on a self-join query specifying date intervals because I won't really know (not with 100% accuracy, anyway) what these date differences will be (otherwise I would do a self join and use dateadd or datediff or something like that to specify my interval). Rather, I want to set up a query somehow referencing something like the last 'n' periods and vary that 'n' as I choose in order to vary the summing date intervals.My current solution is to select a subset from this table using a unique row identifier and ordering by ID and Date into a temporary table in which I perform a self-join as follows:select "rowid" = identity(int,1,1), * into #mytemp from mytableI then conduct my query as follows:select "Date1" = a.Date, "Date2" = b.Date, b.ID, "NperiodSum" = sum(b.val)from #mytemp a, #mytemp bwhere a.ID = b.IDand b.rowid - a.rowid = 2group by a.Date, b.Date, b.ID, b.val, b.rowid - a.rowidgoThis works fine as all I have to do now is to indicate a rowid difference to vary my 'n'. However, it is time consuming, particularly if the recordsets are large. I was wondering if there wasn't a more efficient way of doing this, perhaps using a subquery and referencing the original table, instead of having to create an identity-column populated temp table each time, which I suspect might be consuming far more time for my trouble than is worth.Here's a setup, if anyone's interested (ignore the fact that the dates are at regularly spaced intervals below, normally they won't be):create table #test(Date smalldatetime null,ID char(3) null,Value float null)goinsert into #test(Date,ID,Value)Values('2003-12-31','ABC',4.5)insert into #test(Date,ID,Value)Values('2003-12-31','DEF',2.5)insert into #test(Date,ID,Value)Values('2004-12-31','ABC',6.6)insert into #test(Date,ID,Value)Values('2004-12-31','DEF',7.2)insert into #test(Date,ID,Value)Values('2005-12-31','ABC',8.4)insert into #test(Date,ID,Value)Values('2005-12-31','DEF',1.1)insert into #test(Date,ID,Value)Values('2006-12-31','ABC',11.2)insert into #test(Date,ID,Value)Values('2006-12-31','DEF',1.4)goselect * from #testgodrop table #testgoAny suggestions would be gratefully received.-KS |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-02-13 : 14:20:34
|
| [code]declare @records int-- initialize the @records variable of how many records to select from newest!select @records = 2-- stage the datacreate table #stage( rowid int identity(1, 1), Date smalldatetime null, ID char(3) null, Value float null)insert #stage ( Date, ID, Value )select date, id, valuefrom ( select '2003-12-31' as date, 'ABC' as id, 4.5 as value union all select '2003-12-31', 'DEF', 2.5 union all select '2004-12-31', 'ABC', 6.6 union all select '2004-12-31', 'DEF', 7.2 union all select '2005-12-31', 'ABC', 8.4 union all select '2005-12-31', 'DEF', 1.1 union all select '2006-12-31', 'ABC', 11.2 union all select '2006-12-31', 'DEF', 1.4 ) as originaltableorder by id, date, valueselect * from #stage order by id, date-- make intermediate tabledeclare @med table (id char(3), fromrowid int, torowid int)insert @medselect id, case when max(rowid) - @records + 1 >= min(rowid) then max(rowid) - @records + 1 else min(rowid) end, max(rowid)from #stagegroup by idselect * from @medselect s.*from #stage as sinner join @med as m on m.id = s.id and m.fromrowid <= s.rowid and m.torowid >= s.rowiddrop table #stage[/code]Peter LarssonHelsingborg, Sweden |
 |
|
|
KidSQL
Yak Posting Veteran
88 Posts |
Posted - 2007-02-13 : 17:46:19
|
| Hi Peter,Thank you very much for that insightful suggestion. It seems to work nicely - I will examine it closely.-KS |
 |
|
|
KidSQL
Yak Posting Veteran
88 Posts |
Posted - 2007-02-14 : 09:12:24
|
| Hi,Thanks for this suggestion. However, I'm not sure how much of an advantage it offers over my original way of doing things. It is definitely an alternative, of course - just not exactly as I imagined.Is it possible, for instance, to somehow use a sub-query in which I indicate the count(*) of rows for a given ID whose date is <= a target date and somehow have this specify my n-period sum? I am trying to think of just a simply query without having to create other tables or use unique row identifiers, etc. Maybe something like:select "Date1" = a.Date, "Date2" = b.Date, b.ID, "NperiodSum" = sum(b.val)from #test a, #test b, #test cwhere a.ID = b.IDand a.date >= '2003-12-31'and b.date in (--something like: select max(date) from #test c where ID = 'ABC' and c.date > a.date and count(*) of 'ABC' = 3)Is something like this possible? Unfortunately, as you can see, my SQL skills are not developed enough to come up with something like that without some assistance.Thank you for your posted (and any future) suggestions. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-02-14 : 09:23:45
|
| A subquery is executed for every row in the query. That what is taking long time.My suggestions works fast both for small sets of data, and huge sets of data.But it is ultimately your call.Peter LarssonHelsingborg, Sweden |
 |
|
|
KidSQL
Yak Posting Veteran
88 Posts |
Posted - 2007-02-14 : 09:47:03
|
| I see - in that case, I'm fine using either of the two methods above.Thank you for your help - I appreciate it.-KS |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-02-14 : 09:48:56
|
| And you can always take a look at the execution/query plan too see which query is the most efficient in terms of system resources.Peter LarssonHelsingborg, Sweden |
 |
|
|
|
|
|
|
|