| Author |
Topic  |
|
raxbat
Yak Posting Veteran
52 Posts |
Posted - 12/08/2007 : 10:51:19
|
Hello all! I have he table with the rates for currency pair for the last 12h. I wish to calculate with SQL query the Simpla=e mving average and Exponencial moving average for some period. Help me please wit this query.
My table: Time Rate ---------- 08:00:06 1.4378 08:00:12 1.4377 ... etc...
PS: moving average: http://en.wikipedia.org/wiki/Moving_average
Thank You |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/09/2007 : 02:23:48
|
declare @t table (date datetime not null, rate float not null)
insert @t
select getdate()-10, 1 union all
select getdate()-9, 2 union all
select getdate()-8, 4 union all
select getdate()-7, 4 union all
select getdate()-6, 5 union all
select getdate()-5, 6 union all
select getdate()-4, 6 union all
select getdate()-3, 8 union all
select getdate()-2, 9 union all
select getdate()-1, 10 union all
select getdate()-0, 11 union all
select getdate()+1, 9
select * from @t
-- change the upper limit of the between clause to
-- specify how far back the moving average looks. also if your moving average looks
-- back minutes rather than days, you'll have to change the first arg to datediff.
select
t.date,
avg(tt.rate) as simpleMovingAvg
from @t AS t
join @t tt on DATEDIFF(day, tt.date, t.date) between 0 And 2
group by t.date
order by t.date
-- add more when clauses to extend how far back the weighted moving average looks.
-- adjust weights however you like, with the constraint that they all sum to 1.
-- note that you should exclude the first 2 values returned by this since they don't have
-- a full set of history - a minor boundary effect.
select
t.date,
sum(case when DATEDIFF(day, tt.date, t.date)=0 then 0.7*tt.rate
when DATEDIFF(day, tt.date, t.date)=1 then 0.2*tt.rate
when DATEDIFF(day, tt.date, t.date)=2 then 0.1*tt.rate end) as weightedMovingAvg
from @t t
join @t tt on DATEDIFF(day, tt.date, t.date) between 0 And 2
group by t.date
order by t.date
elsasoft.org |
 |
|
|
raxbat
Yak Posting Veteran
52 Posts |
Posted - 12/09/2007 : 05:03:04
|
Thanx i works! But took a long time to process al my data! I know that it is possible to compute current MA value, knowing previous value! How can I realize this?
Thanx a lot! |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/09/2007 : 12:44:40
|
yea, moving averages (like running totals) are a case where the set-based approach is slower than a procedural approach.
that is, a cursor may be faster (gasp!) since you would only have to read each row once, and there would be no join. To do the moving average as I have done above, each row is read 3 times, plus you have the extra overhead of a join. I'm too lazy to give you the cursor-based solution though. that's left as an exercise for you 
elsasoft.org |
 |
|
|
raxbat
Yak Posting Veteran
52 Posts |
Posted - 12/09/2007 : 12:52:38
|
How can I implement this? Thank You |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/09/2007 : 15:50:39
|
I think the wikipedia link you posted explains it well enough... did you try it?
elsasoft.org |
 |
|
|
raxbat
Yak Posting Veteran
52 Posts |
Posted - 12/09/2007 : 16:13:22
|
If I could ... :( Im the simple user of sql, Im far away from sql programming and difficult functions
help me please |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/09/2007 : 18:10:53
|
well, ok.
this code compares a cursor-based method with the one I gave above. cursor is much faster if you have thousands of rows.
On my system, with 5000 rows of sample data, the cursor took 796 ms and the other method took 12016 ms.
with 100k rows of sample data, cursor took 19 sec. other method I killed after letting it run 5 minutes.
have a look at this article, it explains the problem with queries like this: http://www.sqlservercentral.com/articles/T-SQL/61539
declare @t table (date datetime not null, rate float not null)
declare @movingAverages table (date datetime not null,
simpleMovingAvg float not null,
weightedMovingAvg float not null)
--
-- prepare some random sample data
--
declare @i int
set @i=0
while @i < 5000 --increase this number to make the perf difference even more stark
begin
insert @t select getdate()+@i, rand()
set @i = @i + 1
end
--
-- use an ugly cursor to calculate moving avgs
--
declare @date datetime
declare @rate float
declare @rateCurrent float
declare @rateMinusOne float
declare @rateMinusTwo float
declare @then datetime
set @then=getdate()
declare ugly cursor for select date, rate from @t
open ugly
while 1=1
begin
FETCH NEXT FROM ugly INTO @date, @rate
if @@FETCH_STATUS <> 0 break
set @rateMinusTwo=@rateMinusOne
set @rateMinusOne=@rateCurrent
set @rateCurrent=@rate
if @rateMinusTwo is not null and @rateMinusOne is not null and @rateCurrent is not null
begin
insert @movingAverages
select
@date
,(@rateCurrent + @rateMinusOne + @rateMinusTwo) / 3.0
,0.7 * @rateCurrent + 0.2 * @rateMinusOne + 0.1 * @rateMinusTwo
end
end
CLOSE ugly
DEALLOCATE ugly
-- see what we came up with
select t.date, t.rate, m.simpleMovingAvg, m.weightedMovingAvg
from @movingAverages m
join @t t on t.date=m.date
-- how long did the cursor take?
select datediff(ms, @then, getdate()) as [ugly cursor time]
--
-- now see how long the set-based way takes
--
set @then=getdate()
select
t.date
,avg(tt.rate) as simpleMovingAvg
,sum(case when DATEDIFF(day, tt.date, t.date)=0 then 0.7*tt.rate
when DATEDIFF(day, tt.date, t.date)=1 then 0.2*tt.rate
when DATEDIFF(day, tt.date, t.date)=2 then 0.1*tt.rate end) as weightedMovingAvg
from @t AS t
join @t tt on DATEDIFF(day, tt.date, t.date) between 0 And 2
group by t.date
order by t.date
-- how long did set-based take?
select datediff(ms, @then, getdate()) as [set based time]
elsasoft.org |
 |
|
|
raxbat
Yak Posting Veteran
52 Posts |
Posted - 12/10/2007 : 00:28:23
|
| Thanx a lot! You are the real sql guru :) |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/10/2007 : 09:45:27
|
Jezemine, will you timetest this suggestion with CROSS JOIN?
select t1.date,
t1.rate,
avg(t2.rate)
from @t AS t1
cross join @t as t2
where t2.date between t1.date - 2 and t1.date
group by t1.date,
t1.rate
E 12°55'05.25" N 56°04'39.16" |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/10/2007 : 10:28:44
|
quote: Originally posted by Peso
Jezemine, will you timetest this suggestion with CROSS JOIN?
select t1.date,
t1.rate,
avg(t2.rate)
from @t AS t1
cross join @t as t2
where t2.date between t1.date - 2 and t1.date
group by t1.date,
t1.rate
E 12°55'05.25" N 56°04'39.16"
on my system this was about 10x slower than the cursor on 10k rows.
cursor took 3030ms, cross join took 36376ms.
EDIT: and that's with the cursor calculating BOTH the simple and weighted moving avg. 
elsasoft.org |
Edited by - jezemine on 12/10/2007 10:29:55 |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/10/2007 : 10:43:57
|
So 36 seconds is still faster than the original inner join (killed after 5 minutes)?
E 12°55'05.25" N 56°04'39.16" |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/10/2007 : 11:48:31
|
no, the one I killed had 100k sample rows. I tested yours with 10k sample rows above.
running the test again on 100k rows: cursor takes 15 sec, cross join was still churning after 10 mins. killed it.
I know it's hard to take - we all harp on doing things in a set based way. but sometimes a loop is more efficient!
elsasoft.org |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/10/2007 : 13:46:19
|
I think set-based code has the fastest time  And no need for staging table either!-- Prepare sample data
DECLARE @Sample TABLE (dt DATETIME, Rate FLOAT)
INSERT @Sample
SELECT CURRENT_TIMESTAMP - 10, 1 UNION ALL
SELECT CURRENT_TIMESTAMP - 9, 2 UNION ALL
SELECT CURRENT_TIMESTAMP - 8, 4 UNION ALL
SELECT CURRENT_TIMESTAMP - 7, 4 UNION ALL
SELECT CURRENT_TIMESTAMP - 6, 5 UNION ALL
SELECT CURRENT_TIMESTAMP - 5, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 4, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 3, 8 UNION ALL
SELECT CURRENT_TIMESTAMP - 2, 9 UNION ALL
SELECT CURRENT_TIMESTAMP - 1, 10 UNION ALL
SELECT CURRENT_TIMESTAMP - 0, 11 UNION ALL
SELECT CURRENT_TIMESTAMP + 1, 9
-- Peso 2
SELECT DATEADD(DAY, dt, '19000101') AS dt,
AVG(Rate) AS SimpleMovingAvg,
SUM(wr) AS WeightedMovingAvg
FROM (
SELECT DATEDIFF(DAY, '18991230', dt) AS dt,
0.1 * Rate AS wr,
Rate,
0 AS Actualdate
FROM @Sample
UNION ALL
SELECT DATEDIFF(DAY, '18991231', dt),
0.2 * Rate,
Rate,
0
FROM @Sample
UNION ALL
SELECT DATEDIFF(DAY, '19000101', dt),
0.7 * Rate,
Rate,
1
FROM @Sample
) AS k
GROUP BY dt
HAVING MAX(Actualdate) = 1
ORDER BY dt E 12°55'05.25" N 56°04'39.16" |
Edited by - SwePeso on 12/10/2007 14:07:33 |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/10/2007 : 14:09:30
|
Jezemine, for your 5,000 record test above I get these timings
CURSOR 6,813 ms
Jezemine set-based 20,577 ms
Peso set-based 127 ms
E 12°55'05.25" N 56°04'39.16" |
 |
|
|
jezemine
Flowing Fount of Yak Knowledge
USA
2875 Posts |
Posted - 12/10/2007 : 14:11:57
|
very nice! on 100k rows, your new query finishes in 5 sec, whereas cursor takes 15 sec (EDIT: on my dev box).
@raxbat: is it clear now who the real sql guru is? 
elsasoft.org |
Edited by - jezemine on 12/10/2007 14:19:31 |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/11/2007 : 08:40:12
|
Further optimized for SQL Server 2005 (only a third of the reads for SQL Server 2000 approach). Jezemine, if you have SQL Server 2005, can you timetest this approach for SQL Server 2005 too?DECLARE @Sample TABLE (dt DATETIME, Rate FLOAT)
INSERT @Sample
SELECT CURRENT_TIMESTAMP - 10, 1 UNION ALL
SELECT CURRENT_TIMESTAMP - 9, 2 UNION ALL
SELECT CURRENT_TIMESTAMP - 8, 4 UNION ALL
SELECT CURRENT_TIMESTAMP - 7, 4 UNION ALL
SELECT CURRENT_TIMESTAMP - 6, 5 UNION ALL
SELECT CURRENT_TIMESTAMP - 5, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 4, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 3, 8 UNION ALL
SELECT CURRENT_TIMESTAMP - 2, 9 UNION ALL
SELECT CURRENT_TIMESTAMP - 1, 10 UNION ALL
SELECT CURRENT_TIMESTAMP - 0, 11 UNION ALL
SELECT CURRENT_TIMESTAMP + 1, 9
-- SQL Server 2005
SELECT DATEADD(DAY, p.ActualDate, '19000101') AS dt,
AVG(p.Rate) AS SimpleMovingAvg,
SUM(
CASE p.dt
WHEN 'dt0' THEN p.r0
WHEN 'dt1' THEN p.r1
WHEN 'dt2' THEN p.r2
ELSE 0
END
) AS WeightedMovingAvg
FROM (
SELECT Rate,
DATEDIFF(DAY, '19000101', dt) AS dt0,
0.7 * Rate AS r0,
DATEDIFF(DAY, '18991231', dt) AS dt1,
0.2 * Rate AS r1,
DATEDIFF(DAY, '18991230', dt) AS dt2,
0.1 * Rate AS r2
FROM @Sample
) AS y
UNPIVOT (
ActualDate
FOR dt IN (y.dt0, y.dt1, y.dt2)
) AS p
GROUP BY p.ActualDate
HAVING MIN(p.dt) = 'dt0'
ORDER BY p.ActualDate
-- SQL Server 2000
SELECT DATEADD(DAY, k.dt, '19000101') AS dt,
AVG(k.Rate) AS SimpleMovingAvg,
SUM(k.wr) AS WeightedMovingAvg
FROM (
SELECT DATEDIFF(DAY, '18991230', dt) AS dt,
0.1 * Rate AS wr,
Rate,
0 AS Actualdate
FROM @Sample
UNION ALL
SELECT DATEDIFF(DAY, '18991231', dt),
0.2 * Rate,
Rate,
0
FROM @Sample
UNION ALL
SELECT DATEDIFF(DAY, '19000101', dt),
0.7 * Rate,
Rate,
1
FROM @Sample
) AS k
GROUP BY k.dt
HAVING MAX(k.Actualdate) = 1
ORDER BY k.dt
E 12°55'05.25" N 56°04'39.16" |
 |
|
|
madhivanan
Premature Yak Congratulator
India
22469 Posts |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/11/2007 : 09:00:12
|
Hard to tell. With the moving average, you have a finite number of elements (records) to compare with. With a running total, there can be any number of records to compare with.
E 12°55'05.25" N 56°04'39.16" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29156 Posts |
Posted - 12/11/2007 : 09:07:10
|
This method is still useful in SQL Server 2005.CREATE TABLE #Sample
(
dt DATETIME,
Rate FLOAT,
running FLOAT
)
INSERT #Sample
(
dt,
Rate
)
SELECT CURRENT_TIMESTAMP - 10, 1 UNION ALL
SELECT CURRENT_TIMESTAMP - 4, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 3, 8 UNION ALL
SELECT CURRENT_TIMESTAMP - 7, 4 UNION ALL
SELECT CURRENT_TIMESTAMP - 6, 5 UNION ALL
SELECT CURRENT_TIMESTAMP - 5, 6 UNION ALL
SELECT CURRENT_TIMESTAMP - 2, 9 UNION ALL
SELECT CURRENT_TIMESTAMP - 1, 10 UNION ALL
SELECT CURRENT_TIMESTAMP - 0, 11 UNION ALL
SELECT CURRENT_TIMESTAMP - 9, 2 UNION ALL
SELECT CURRENT_TIMESTAMP - 8, 4 UNION ALL
SELECT CURRENT_TIMESTAMP + 1, 9
DECLARE @rt FLOAT
SET @rt = 0
CREATE CLUSTERED INDEX IX_Peso ON #Sample (dt)
UPDATE #Sample
SET @rt = running = @rt + Rate
DROP INDEX #Sample.IX_Peso
SELECT dt,
Rate,
running
FROM #Sample
ORDER BY dt
DROP TABLE #Sample
E 12°55'05.25" N 56°04'39.16" |
 |
|
Topic  |
|
|
|