Please start any new threads on our new site at http://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

Our new SQL Server Forums are live! Come on over! We've restricted the ability to create new threads on these forums.

 SQL Server Forums Profile | Active Topics | Members | Search | Forum FAQ Register Now and get your question answered!
 All Forums  Site Related Forums  Article Discussion  Article: Joining to the Next Sequential Row Reply to Topic  Printer Friendly
Author  Topic
Page: of 2

USA
0 Posts

 Posted - 04/02/2008 :  07:59:54 One of the more obscure requirements that a developer may find themselves facing is the need to compare a row with its immediate sibling. One such case is when a list of values needs to be processed to produce a moving average or to smooth a sequence of statistical numbers where their order is important.  For example, values lying along a time line. The solution is actually quite simple, but not immediately obvious.Read Joining to the Next Sequential Row

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/02/2008 :  12:09:07 What about this approach?```SELECT AVG(e.Gap) FROM ( SELECT d.Period, MAX(d.Stat) - MIN(d.Stat) AS Gap FROM ( SELECT Stat, Period, 0 AS aFake FROM tbStats UNION ALL SELECT Stat, DATEADD(DAY, 1, Period), 1 FROM tbStats ) AS d GROUP BY d.Period HAVING COUNT(*) = 2 ) AS e```E 12°55'05.25"N 56°04'39.16"

mharr
Starting Member

USA
20 Posts

 Posted - 04/02/2008 :  15:21:01 I believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID:``` select x.r_id xId, y.r_id yId, x.stat xStat, y.stat yStat from (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from tbStats ) x left join (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from tbStats ) y on x.r_id + 1 = y.r_id order by x.r_id ```I would think this would be more efficient than executing a TOP(1) query in a udf for each row. It may be more efficient to just put the results of the SELECT Row_Number... query into a temp table and do the Solution #1 option against the temp table, but I suspect that SQL query optimizer will recognize that both subqueries are the same and do that anyways if it thinks it is more efficient (probably depending on estimated size of the resultset).BTW, I specified both the period and id columns in the order by clause of the Row_Number function just to ensure that both results returned same results order, even if the table has same values for period in more than one row. Otherwise, subquery x and y could return rows in different order, and the join clause (x.r_id + 1 = y.r_id) would calculate on same value.Mark Edited by - mharr on 04/02/2008 15:25:18

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/02/2008 :  15:24:05 If you already are into sql server 2005 thingies, why don't use an CTE?It would be fun if someone could performance test the different approaches.E 12°55'05.25"N 56°04'39.16"

mharr
Starting Member

USA
20 Posts

 Posted - 04/02/2008 :  15:38:29 quote:Originally posted by mharrI believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID: ....It jsut occurred to me after I posted this that there is a somewhat better way to do this same solution:``` select x.r_id xId, y.r_id yId, x.stat xStat, y.stat yStat, from (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) + 1 as r_id, stat from #tbStats ) x left join (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from #tbStats ) y on x.r_id = y.r_id order by x.r_id ```Or, the final solution:``` select avg(abs(x.stat - y.stat)) movingAvg from (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) + 1 as r_id, stat from #tbStats ) x left join (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from #tbStats ) y on x.r_id = y.r_id ```Mark

atulmar
Starting Member

USA
7 Posts

 Posted - 04/02/2008 :  21:03:58 I have used row_number with combination of CTE in these situation many times. It just works so simple and great.Only thing is that the plan it generates it more greater. For many records I would not suggest to use this, as it is going to use a lot of memory if there are millions of records in data set.At the same time for smaller data sets I dont see any other better option too.ThanksAtul

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/03/2008 :  03:33:08 Ok, some updates for performance measuring (original 8 sample records).```Paul Alcon 2 32 reads 15 duration Paul Alcon 3 26 reads 1 duration Peso 6 reads 1 duration mharr 20 reads 2 duration```Using 2048 sample records```Paul Alcon 2 6161 reads 140 duration Paul Alcon 3 6155 reads 42 duration Peso 16 reads 15 duration mharr 16 reads 19 duration```Using 487765 sample records```Peso 2544 reads 810 duration mharr 2544 reads 1498 duration```E 12°55'05.25"N 56°04'39.16" Edited by - SwePeso on 04/09/2008 03:55:01

RevMike
Starting Member

9 Posts

 Posted - 04/04/2008 :  09:07:19 quote:Originally posted by mharrI believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID:``` select x.r_id xId, y.r_id yId, x.stat xStat, y.stat yStat from (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from tbStats ) x left join (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from tbStats ) y on x.r_id + 1 = y.r_id order by x.r_id ```This solution is absolutely screaming for a SQL WITH clause.BTW, I a general SQL expert. I work in data integration and regularly deal with about 30 different relational products and vendors. Frankly I'm not sure how well SQL Server supports "WITH", though it is SQL-99 standard.The solution would look like this...``` with tmpStats AS (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id, stat from tbStats ) select x.r_id xId, y.r_id yId, x.stat xStat, y.stat yStat from tmpStats x left join tmpStats y on x.r_id + 1 = y.r_id order by x.r_id ```[/quote]

cas_o
Posting Yak Master

United Kingdom
154 Posts

 Posted - 04/09/2008 :  03:24:33 Well I think we have to concede the award goes to Peso/mharr looking at the performance stats. Mharr's solution appears to scale more linearly.;-]... Quack Waddle

dmckinney
Starting Member

France
3 Posts

 Posted - 04/10/2008 :  02:47:34 I think the CTE approach definitely merits a mention. I attach a link to an article I wrote regarding this.Although it's called 'Linking to the previous row', I'm sure it could be adapted to link to the next row ;-)http://www.sqlservercentral.com/articles/T-SQL/62159/Regards,David McKinney.

DevelopAri
Starting Member

Iceland
1 Posts

 Posted - 04/10/2008 :  05:54:34 I solved a similar problem (taking a time-weighted average of a time-series) by using two cursors: The first cursor pointing to row n and the second to row n+1.Do you have any reservation regarding this solution?

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/10/2008 :  06:54:36 Also seehttp://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911E 12°55'05.25"N 56°04'39.16"

Starnamer
Starting Member

United Kingdom
4 Posts

 Posted - 04/11/2008 :  08:11:56 After reading this I decided to try it on a real problem I have. The aim is to take a series of statuses and dates for documents and get pairs of status&date with next_status&date. This is so that further processing can look at turnaround times, etc.However, my implementation of mharr's method turns out to be slower than a simple triangular join (select a.x,min(b.x) from t join t a join t.b on a.x 0 begin set @st = 'F' set @date = dateadd(mi,rand()*1440.0*28,convert(datetime,'2008/2/1')) insert #t (id,st,date) values(@n,@st,@date) set @cp = 0.3 while @st <> 'C' begin if @st = 'F' begin set @p = rand() if @p < @cp set @st = 'C' else if @p < 0.99 set @st = 'P' end else begin set @p = rand(1) if @p < 0.9 set @st = 'F' end set @date = dateadd(mi,rand()*1440*3+5,@date) insert #t (id,st,date) values(@n,@st,@date) set @cp = @cp * 1.1 end set @n = @n -1 end select * from #t set nocount off set statistics io on set statistics time on -------------------------------------------- -- version 1 - triangular join ------------------------------ -- This is the method currently used and -- produces the desired result -- select id, sta, min(da) as 'da', stb, db from ( select a.id, a.sta, a.da, b.st as 'stb', b.date as db from ( select id, sta, da, min(db) as 'db' from ( select a.id, a.st as 'sta', a.date as 'da', b.st as 'stb', min(b.date) as 'db' from #t a join #t b on a.id=b.id and a.date < b.date and a.st <> b.st group by a.id, a.st, a.date, b.st ) x group by id, sta, da ) a join #t b on a.id=b.id and a.db=b.date ) x group by id, sta, stb, db order by id, min(da) -------------------------------------------- -- version 2 - using self join by row_number + 1 = row_number ------------ -- this was derived from comments following the article at -- http://www.sqlteam.com/article/joining-to-the-next-sequential-row -- ;with t1 as ( select row_number() over(order by id,date) as 'r_id', id, st, date from #t ) ,t4 as ( select row_number() over(order by id,date) as 'r_id', id, st, date from ( select top 1 id, st, date from t1 order by id, date union all select id, st, date from ( select b.id, b.st, b.date from t1 a join t1 b on a.r_id+1=b.r_id where (a.id=b.id and a.st<>b.st) or (a.id<>b.id) ) z ) x ) select a.id, a.st as 'sta', a.date as 'da', b.st as 'stb', b.date as 'db' from t4 a join t4 b on a.r_id+1=b.r_id and a.id=b.id order by id, da set statistics time off set statistics io off ```

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/12/2008 :  15:53:54 Try this one...```;WITH Yak (id, st, date, rowid, col) AS ( SELECT id, st, date, ROW_NUMBER() OVER (ORDER BY id, date), 0 FROM #t ) SELECT id, MAX(CASE WHEN col = 0 THEN st ELSE '' END) AS sta, MAX(CASE WHEN col = 0 THEN date ELSE '' END) AS da, MAX(CASE WHEN col = 1 THEN st ELSE '' END) AS stb, MAX(CASE WHEN col = 1 THEN date ELSE '' END) AS db FROM ( select id, st, date, rowid, col from Yak union all select id, st, date, rowid - 1, col + 1 from Yak ) AS d group by id, rowid having count(*) = 2 order by rowid```E 12°55'05.25"N 56°04'39.16" Edited by - SwePeso on 04/12/2008 16:25:43

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/12/2008 :  16:04:56 Results for 1000 sample records:``` derek 1 - triangular join Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#t'. Scan count 3, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 141 ms, elapsed time = 1518 ms. derek 2 - cte's Table '#t'. Scan count 16, logical reads 208, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 4, logical reads 702520, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 55094 ms, elapsed time = 121561 ms. peso Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#t'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 47 ms, elapsed time = 565 ms. ```E 12°55'05.25"N 56°04'39.16" Edited by - SwePeso on 04/12/2008 16:28:57

Starnamer
Starting Member

United Kingdom
4 Posts

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/14/2008 :  04:38:19 Did you use same code to generate sample data?E 12°55'05.25"N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/14/2008 :  04:46:49 Wow. Where did those values come from?I used this code for testing```-- Peso ;WITH Yak (id, st, date, RowID) AS ( SELECT id, st, date, ROW_NUMBER() OVER (ORDER BY id, date) FROM Derek ) SELECT id, MAX(CASE WHEN col = 0 THEN st ELSE '' END) AS sta, MAX(CASE WHEN col = 0 THEN date ELSE '' END) AS da, MAX(CASE WHEN col = 1 THEN st ELSE '' END) AS stb, MAX(CASE WHEN col = 1 THEN date ELSE '' END) AS db FROM ( SELECT id, st, date, RowID, 0 AS Col FROM Yak UNION ALL SELECT id, st, date, RowID - 1, 1 FROM Yak ) AS d GROUP BY id, RowID HAVING COUNT(*) = 2 ORDER BY RowID -- Derek 1 select id, sta, min(da) as 'da', stb, db from ( select a.id, a.sta, a.da, b.st as 'stb', b.date as db from ( select id, sta, da, min(db) as 'db' from ( select a.id, a.st as 'sta', a.date as 'da', b.st as 'stb', min(b.date) as 'db' from Derek a join Derek b on a.id=b.id and a.date < b.date and a.st <> b.st group by a.id, a.st, a.date, b.st ) x group by id, sta, da ) a join Derek b on a.id=b.id and a.db=b.date ) x group by id, sta, stb, db order by id, min(da) -- Derek 3 ;with t1 as ( select row_number() over(order by id,date) as 'r_id', id, st, date from Derek ) ,t4 as ( select row_number() over(order by id,date) as 'r_id', id, st, date from ( select top 1 id, st, date from t1 order by id, date union all select id, st, date from ( select b.id, b.st, b.date from t1 a join t1 b on a.r_id+1=b.r_id where (a.id=b.id and a.st<>b.st) or (a.id<>b.id) group by b.id, b.st, b.date ) z ) x ) select a.id, a.st as 'sta', a.date as 'da', b.st as 'stb', b.date as 'db' from t4 a join t4 b on a.r_id+1=b.r_id and a.id=b.id order by id, da```SET @n = 1000 -- Peso```Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 140 ms, elapsed time = 790 ms. SET @n = 100 -- Peso Table 'Derek'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 345 ms. SET @n = 10 -- Peso Table 'Derek'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 8 ms.```E 12°55'05.25"N 56°04'39.16" Edited by - SwePeso on 04/14/2008 04:57:33

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/14/2008 :  04:53:28 ```SET @n = 10 Derek 1 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 3, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 24 ms. Derek 3 Table 'Worktable'. Scan count 1, logical reads 359, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 6, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 42 ms. SET @n = 100 Derek 1 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 3, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 47 ms, elapsed time = 462 ms. Derek 3 Table 'Worktable'. Scan count 1, logical reads 7214, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 6, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 422 ms, elapsed time = 1131 ms. SET @n = 1000 Derek 1 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Derek'. Scan count 3, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 281 ms, elapsed time = 1183 ms. Derek 3 Table 'Derek'. Scan count 18, logical reads 156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 438 ms, elapsed time = 1321 ms.```E 12°55'05.25"N 56°04'39.16"

Starnamer
Starting Member

United Kingdom
4 Posts

 Posted - 04/14/2008 :  07:37:26 quote:Wow. Where did those values come from?The timings are just those from my laptop for @n=1000. (Latitude D810 with 1.5Gb memory). I also added a Primary Key to the test data on columns (id,date).I've now checked your query and the problem is that it doesn't eliminate those times when the status doesn't change. hence the different number of records.Data:```id st date 19 F 2008-02-11 14:58:00.000 19 P 2008-02-11 21:17:00.000 19 P 2008-02-14 11:20:00.000 19 F 2008-02-15 02:11:00.000 19 P 2008-02-15 06:07:00.000 19 P 2008-02-16 05:27:00.000 19 F 2008-02-18 09:22:00.000 19 C 2008-02-19 05:50:00.000 ```My versions:```id sta da stb db 19 F 2008-02-11 14:58:00.000 P 2008-02-11 21:17:00.000 19 P 2008-02-11 21:17:00.000 F 2008-02-15 02:11:00.000 19 F 2008-02-15 02:11:00.000 P 2008-02-15 06:07:00.000 19 P 2008-02-15 06:07:00.000 F 2008-02-18 09:22:00.000 19 F 2008-02-18 09:22:00.000 C 2008-02-19 05:50:00.000```Peso```id sta da stb db 19 F 2008-02-11 14:58:00.000 P 2008-02-11 21:17:00.000 19 P 2008-02-11 21:17:00.000 P 2008-02-14 11:20:00.000 19 P 2008-02-14 11:20:00.000 F 2008-02-15 02:11:00.000 19 F 2008-02-15 02:11:00.000 P 2008-02-15 06:07:00.000 19 P 2008-02-15 06:07:00.000 P 2008-02-16 05:27:00.000 19 P 2008-02-16 05:27:00.000 F 2008-02-18 09:22:00.000 19 F 2008-02-18 09:22:00.000 C 2008-02-19 05:50:00.000```The requirement is that it only record pairs of statuses which are different.Currently, it looks like using row_number() is fastest.

SwePeso
Patron Saint of Lost Yaks

Sweden
30421 Posts

 Posted - 04/14/2008 :  08:52:46 What if you "cheat"?Imagine you have a column named "Seq" and populate it's value like this```DECLARE @Seq INT, @Yak INT, @st CHAR(1) SET @Seq = -1 UPDATE Derek SET @Seq = @Seq + 1, @Yak = Seq = CASE WHEN st = @st THEN NULL ELSE @Seq END, @Seq = CASE WHEN @Yak IS NULL THEN @Seq - 1 ELSE @Seq END, @st = st```Then you easily can write a query like this```SELECT id, MAX(CASE WHEN Col = 0 THEN st ELSE '' END) AS sta, MAX(CASE WHEN Col = 0 THEN date ELSE '' END) AS da, MAX(CASE WHEN Col = 1 THEN st ELSE '' END) AS stb, MAX(CASE WHEN Col = 1 THEN date ELSE '' END) AS db FROM ( SELECT id, st, date, Seq, 0 AS Col FROM Derek WHERE Seq IS NOT NULL UNION ALL SELECT id, st, date, Seq - 1, 1 FROM Derek WHERE Seq IS NOT NULL ) AS d GROUP BY id, Seq HAVING COUNT(*) = 2 ORDER BY id, MAX(CASE WHEN Col = 0 THEN date ELSE '' END)```to get the result you want.E 12°55'05.25"N 56°04'39.16"
Page: of 2  Topic
 Reply to Topic  Printer Friendly Jump To: Select Forum General SQL Server Forums       New to SQL Server Programming       New to SQL Server Administration       Script Library       Data Corruption Issues       Database Design and Application Architecture SQL Server 2012 Forums       Transact-SQL (2012)       SQL Server Administration (2012)       SSIS and Import/Export (2012)       Analysis Server and Reporting Services (2012)       Replication (2012)       Availability Groups and DR (2012)       Other SQL Server 2012 Topics SQL Server 2008 Forums       Transact-SQL (2008)       SQL Server Administration (2008)       SSIS and Import/Export (2008)       High Availability (2008)       Replication (2008)       Analysis Server and Reporting Services (2008)       Other SQL Server 2008 Topics SQL Server 2005 Forums       Transact-SQL (2005)       SQL Server Administration (2005)       .NET Inside SQL Server (2005)       SSIS and Import/Export (2005)       Service Broker (2005)       Replication (2005)       High Availability (2005)       Analysis Server and Reporting Services (2005)       Express Edition and Compact Edition (2005)       Other SQL Server Topics (2005) SQL Server 2000 Forums       SQL Server Development (2000)       SQL Server Administration (2000)       Import/Export (DTS) and Replication (2000)       Transact-SQL (2000)       Analysis Services (2000)       MSDE (2000) Development Tools       ASP.NET       Reporting Services Development       Other Development Tools Site Related Forums       Site Related Discussions       Article Discussion       Poll Discussion       The Yak Corral Other Forums       SQL Server 6.5 \ SQL Server 7.0       Other Topics       MS Access       ClearTrace Support Forum Old Forums       CLOSED - General SQL Server       CLOSED - SQL Server 2005/Yukon  -------------------- Home Active Topics Frequently Asked Questions Member Information Search Page
 SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC