Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Sean_B
Posting Yak Master
111 Posts |
Posted - 2009-01-09 : 06:52:30
|
| Hi,I've written the following sql to median check a number of trade values, each trade is compared to the median of the last 100 trades to see if it looks like a realistic value.---start declare @ThresholdPercent float; SET @ThresholdPercent = 100 SET @ThresholdPercent = (100 + @ThresholdPercent) /100 declare @Window INT;SET @window = 101 declare @median INT; SET @window = 100; SELECT tradeid,tdate,tclose, mclose FROM ( SELECT tradeid,tdate,tclose, mclose, RANK() OVER (PARTITION BY tradeid ORDER BY mclose asc) id from ( SELECT p.tradeid,p.tdate,p.tclose,x.tdate mdate,x.tclose mclose FROM #temp p CROSS apply (SELECT TOP 100 * FROM #temp t where t.tDate < P.tDate order BY t.tDate desc) x ) x ) y WHERE y.id = @window / 2 AND (tclose > (mclose * @ThresholdPercent) OR tclose > (mclose * @ThresholdPercent))---endHeres a loop to inset some test data--startDROP TABLE #tempCREATE TABLE #temp (TradeID INT IDENTITY(1,1),tDate DATETIME,tClose DECIMAL(18,6) )CREATE CLUSTERED INDEX ix_Date ON #temp(tDate)DECLARE @c int;SET @c = 0DECLARE @tDate DATETIMEDECLARE @tclose decimal(18,6)SELECT @tDate = '01/01/2008'SELECT @tclose = 10.5WHILE @c < 10000BEGIN INSERT INTO #temp (tDate,tClose) SELECT @tDate,@tcloseSELECT @tDate = DATEADD(minute,5,@tDate)SELECT @tclose = @tclose + 1SET @c = @c + 1END-----endThis works ok on small numbers, but once I get to checking millions of rows it slows right down.I've tried seperating the inner queries into intermediary stages with temporary tables that I've indexed, but it is still taking 20 plus minutes to check a million rows.Any suggestions on speeding this up would be appreciated.Sean |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-01-09 : 15:55:13
|
a running median of previous 100 rows over 1M rows is going to be painful no matter how you do it. Is tradeid an unbroken sequence in the same order as tDate? If so see if this is any better:btw, median is the average of the middle two values in a set with an even number of constituents so I incorporated that.declare @ThresholdPercent floatSET @ThresholdPercent = 100SET @ThresholdPercent = (100 + @ThresholdPercent) /100select t.Tradeid ,t.tDate ,t.tClose ,avg(ca.tClose) mClosefrom #temp tcross apply ( select tradeid ,tClose ,row_number() over (order by tClose) rn from #temp a where a.tradeid <= t.tradeid + 100 and a.tradeid > t.tradeid ) cawhere ca.rn in (50,51)group by t.Tradeid ,t.tDate ,t.tClosehaving t.tclose > avg(ca.tclose) * @ThresholdPercent Be One with the OptimizerTG |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2009-01-10 : 07:58:25
|
Sean,You definitely need to use ROW_NUMBER() not RANK() here: if you have duplicate values the "WHERE y.id = @window / 2" condition may fail with RANK().Not an error, but there's a curious repeated condition "(tclose > (mclose * @ThresholdPercent) OR tclose > (mclose * @ThresholdPercent))". I'm not sure what you intended here.I'd be inclined to use an odd window size to sidestep the question of taking the mean of the middle two values!However, the main problem with the current query is that, for a #temp with 1,000,000 rows the execution plan will include a sort of a temporary table of 100,000,000 rows. By calculating the median within the CROSS APPLY you replace that sort with 1,000,000 sorts of 100 rows each.SELECT P.tradeid, P.tdate, P.tclose, T.mcloseFROM #temp AS PCROSS APPLY ( SELECT T.tradeid, T.tdate, T.tclose AS mclose FROM ( SELECT T.*, ROW_NUMBER() OVER (ORDER BY T.tclose ASC) AS id FROM ( SELECT TOP 101 T.* FROM #temp AS T WHERE T.tdate < P.tdate ORDER BY T.tdate DESC ) AS T ) AS T WHERE T.id = 51 ) AS TWHERE P.tclose > (T.mclose * @ThresholdPercent) For 1,000,000 rows I get 73 seconds, versus 372 seconds for your original (modified to use ROW_NUMBER(), have the same window size, remove the duplicate condition):SELECT tradeid,tdate,tclose, mclose FROM ( SELECT tradeid,tdate,tclose, mclose, ROW_NUMBER() OVER (PARTITION BY tradeid ORDER BY mclose asc) id from ( SELECT p.tradeid,p.tdate,p.tclose,x.tdate mdate,x.tclose mclose FROM #temp p CROSS apply (SELECT TOP 101 * FROM #temp t where t.tdate < p.tdate order BY t.tdate desc) x ) x) yWHERE y.id = 51AND tclose > (mclose * @ThresholdPercent) |
 |
|
|
Sean_B
Posting Yak Master
111 Posts |
Posted - 2009-01-12 : 04:37:50
|
| Thanks this is a big help, I wouldn't have thought of calculating the median within the cross apply.Sean |
 |
|
|
Sean_B
Posting Yak Master
111 Posts |
Posted - 2009-01-12 : 06:50:42
|
| Is there any way to ignore those entries where there are less than 100 results returned by the select top 101, this is only a problem at the start of the dataset, where we need to ignore the first 100 records. Is there any way of doing this within the select top 101 sub query ?Sean |
 |
|
|
|
|
|
|
|