Article: Joining to the Next Sequential Row

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

Article Discussion

Article: Joining to the Next Sequential Row

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2008-04-02 : 07:59:54

One of the more obscure requirements that a developer may find themselves facing is the need to compare a row with its immediate sibling. One such case is when a list of values needs to be processed to produce a moving average or to smooth a sequence of statistical numbers where their order is important. For example, values lying along a time line. The solution is actually quite simple, but not immediately obvious.

Read Joining to the Next Sequential Row

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-02 : 12:09:07

What about this approach?

SELECT	AVG(e.Gap)
FROM	(
		SELECT		d.Period,
				MAX(d.Stat) - MIN(d.Stat) AS Gap
		FROM		(
					SELECT	Stat,
						Period,
						0 AS aFake
					FROM	tbStats

					UNION ALL

					SELECT	Stat,
						DATEADD(DAY, 1, Period),
						1
					FROM	tbStats
				) AS d
		GROUP BY	d.Period
		HAVING		COUNT(*) = 2
	) AS e

E 12°55'05.25"
N 56°04'39.16"

mharr
Starting Member

20 Posts

Posted - 2008-04-02 : 15:21:01

I believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID:


select
    x.r_id xId,
    y.r_id yId, 
    x.stat xStat,
    y.stat yStat 
from 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from tbStats ) x 
    left join 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from tbStats ) y 
    on x.r_id + 1 = y.r_id 
order by 
    x.r_id

I would think this would be more efficient than executing a TOP(1) query in a udf for each row. It may be more efficient to just put the results of the SELECT Row_Number... query into a temp table and do the Solution #1 option against the temp table, but I suspect that SQL query optimizer will recognize that both subqueries are the same and do that anyways if it thinks it is more efficient (probably depending on estimated size of the resultset).

BTW, I specified both the period and id columns in the order by clause of the Row_Number function just to ensure that both results returned same results order, even if the table has same values for period in more than one row. Otherwise, subquery x and y could return rows in different order, and the join clause (x.r_id + 1 = y.r_id) would calculate on same value.

Mark

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-02 : 15:24:05

If you already are into sql server 2005 thingies, why don't use an CTE?
It would be fun if someone could performance test the different approaches.

E 12°55'05.25"
N 56°04'39.16"

mharr
Starting Member

20 Posts

Posted - 2008-04-02 : 15:38:29

quote:
Originally posted by mharr

I believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID: ....

It jsut occurred to me after I posted this that there is a somewhat better way to do this same solution:


select
    x.r_id xId,
    y.r_id yId, 
    x.stat xStat,
    y.stat yStat,
from 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) + 1 as r_id,
     stat from #tbStats ) x 
    left join 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from #tbStats ) y 
    on x.r_id = y.r_id 
order by 
    x.r_id

Or, the final solution:


select
    avg(abs(x.stat - y.stat)) movingAvg  
from 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) + 1 as r_id,
     stat from #tbStats ) x 
    left join 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from #tbStats ) y 
    on x.r_id = y.r_id

Mark

atulmar
Starting Member

7 Posts

Posted - 2008-04-02 : 21:03:58

I have used row_number with combination of CTE in these situation many times. It just works so simple and great.

Only thing is that the plan it generates it more greater. For many records I would not suggest to use this, as it is going to use a lot of memory if there are millions of records in data set.

At the same time for smaller data sets I dont see any other better option too.

Thanks
Atul

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-03 : 03:33:08

Ok, some updates for performance measuring (original 8 sample records).

Paul Alcon 2	  32 reads	 15 duration
Paul Alcon 3	  26 reads	  1 duration
Peso		   6 reads	  1 duration
mharr		  20 reads	  2 duration

Using 2048 sample records

Paul Alcon 2	6161 reads	140 duration
Paul Alcon 3	6155 reads	 42 duration
Peso		  16 reads	 15 duration
mharr		  16 reads	 19 duration

Using 487765 sample records

Peso		  2544 reads	 810 duration
mharr		  2544 reads	1498 duration

E 12°55'05.25"
N 56°04'39.16"

RevMike
Starting Member

9 Posts

Posted - 2008-04-04 : 09:07:19

quote:

Originally posted by mharr

I believe you can still use solution #1 even if the rows are out of order, by using ROW_NUMBER() function to create a new perfectly sequential ID:


select
    x.r_id xId,
    y.r_id yId, 
    x.stat xStat,
    y.stat yStat 
from 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from tbStats ) x 
    left join 
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from tbStats ) y 
    on x.r_id + 1 = y.r_id 
order by 
    x.r_id

This solution is absolutely screaming for a SQL WITH clause.

BTW, I a general SQL expert. I work in data integration and regularly deal with about 30 different relational products and vendors. Frankly I'm not sure how well SQL Server supports "WITH", though it is SQL-99 standard.

The solution would look like this...


with tmpStats AS
    (SELECT ROW_NUMBER() OVER(ORDER BY period ASC, id ASC) as r_id,
     stat from tbStats )
select
    x.r_id xId,
    y.r_id yId, 
    x.stat xStat,
    y.stat yStat 
from 
    tmpStats x 
    left join 
    tmpStats y 
    on x.r_id + 1 = y.r_id 
order by 
    x.r_id

[/quote]

cas_o
Posting Yak Master

154 Posts

Posted - 2008-04-09 : 03:24:33

Well I think we have to concede the award goes to Peso/mharr looking at the performance stats. Mharr's solution appears to scale more linearly.

;-]... Quack Waddle

dmckinney
Starting Member

3 Posts

Posted - 2008-04-10 : 02:47:34

I think the CTE approach definitely merits a mention. I attach a link to an article I wrote regarding this.

Although it's called 'Linking to the previous row', I'm sure it could be adapted to link to the next row ;-)

http://www.sqlservercentral.com/articles/T-SQL/62159/

Regards,

David McKinney.

DevelopAri
Starting Member

1 Post

Posted - 2008-04-10 : 05:54:34

I solved a similar problem (taking a time-weighted average of a time-series) by using two cursors: The first cursor pointing to row n and the second to row n+1.

Do you have any reservation regarding this solution?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-10 : 06:54:36

Also see
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911

E 12°55'05.25"
N 56°04'39.16"

Starnamer
Starting Member

4 Posts

Posted - 2008-04-11 : 08:11:56

After reading this I decided to try it on a real problem I have. The aim is to take a series of statuses and dates for documents and get pairs of status&date with next_status&date. This is so that further processing can look at turnaround times, etc.

However, my implementation of mharr's method turns out to be slower than a simple triangular join (select a.x,min(b.x) from t join t a join t.b on a.x<b.x).

So my question is, is there a faster way to do this?

As a second question, can anyone suggest why the performance of the row_number test solution goes down the tubes (on my system) at between 712 and 714 records (713 sometimes takes ~300ms CPU, others ~76000ms!).

P.S. The server has 4 CPUs hence for 1000 records, elapse time is only about 25% of CPU time.


/*
100 records generated

Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table '#t'. Scan count 3, logical reads 9,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 29 ms.

Table '#t'. Scan count 30, logical reads 18,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 46 ms,  elapsed time = 35 ms.
-------------------------------------------
500 records generated

Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table '#t'. Scan count 3, logical reads 39,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 203 ms.

Table '#t'. Scan count 30, logical reads 78,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 205 ms,  elapsed time = 157 ms.
---------------------------------------------
1000 records generated

Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table '#t'. Scan count 3, logical reads 78,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 297 ms,  elapsed time = 716 ms.

Table '#t'. Scan count 28, logical reads 312,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'Worktable'. Scan count 8, logical reads 775568,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 152624 ms,  elapsed time = 40658 ms.
--------------------------------------------------
*/
set statistics io off
set statistics time off
set nocount on
-----------------------------------
-- create some data
-- ----------------
-- This actually models part of a document workflow.
-- The flow has been simplified to just 3 states.
-- Documents start in state 'F' and may move to
-- either 'C' (compete) or 'P' (queried) or may be
-- amended and stay in state 'F'. Once in state 'P',
-- the query may be answered, so the document goes 
-- to state 'F' or it may be amended and stay in
-- state 'P'. The probabilities of moving between
-- states are arbitrarily set for the model. In
-- order to simulate repeated queries making it
-- more likely for the document to be completed,
-- the probability is increased on each step.
-- Time intervals are randomly assigned between
-- 5 minutes and 1 month. In the model, the generated
-- IDs are sequential. In the real application, they
-- may not be.
-- 
drop table #t
create table #t (id int, st char(1), date datetime)

declare @n int
declare @st char(1)
declare @date datetime
declare @Cp float
declare @p float

set @n = 1000 -- <<<<<<<<<<<<<<<<< How many ID chains to generate
print convert(varchar(5),@n)+' records generated'
while @n > 0
begin
	set @st = 'F'
	set @date = dateadd(mi,rand()*1440.0*28,convert(datetime,'2008/2/1'))
	insert #t (id,st,date) values(@n,@st,@date)
	set @cp = 0.3
	while @st <> 'C'
	begin
		if @st = 'F'
		begin
			set @p = rand()
			if @p < @cp
				set @st = 'C'
			else
				if @p < 0.99
					set @st = 'P'
		end
		else
		begin
			set @p = rand(1)
			if @p < 0.9
				set @st = 'F'
		end
		set @date = dateadd(mi,rand()*1440*3+5,@date)
		insert #t (id,st,date) values(@n,@st,@date)
		set @cp = @cp * 1.1
	end
	set @n = @n -1
end
select * from #t

set nocount off
set statistics io on
set statistics time on
--------------------------------------------
-- version 1 - triangular join
------------------------------
-- This is the method currently used and
-- produces the desired result
--
select 
	id,
	sta,
	min(da) as 'da',
	stb,
	db 
from (
	select
		a.id,
		a.sta,
		a.da,
		b.st as 'stb',
		b.date as db
	from (
		select
			id,
			sta,
			da,
			min(db) as 'db'
		from (
			select 
				a.id,
				a.st as 'sta',
				a.date as 'da',
				b.st as 'stb',
				min(b.date) as 'db' 
			from
				#t a
			join
				#t b
					on a.id=b.id
					and a.date < b.date
					and a.st <> b.st
			group by
				a.id,
				a.st,
				a.date,
				b.st
		) x
		group by
			id,
			sta,
			da
	) a 
	join
		#t b
			on a.id=b.id
			and a.db=b.date
) x
group by
	id,
	sta,
	stb,
	db
order by
	id,
	min(da)

--------------------------------------------
-- version 2 - using self join by row_number + 1 = row_number
------------
-- this was derived from comments following the article at
-- http://www.sqlteam.com/article/joining-to-the-next-sequential-row
--
;with t1 as (
select
	row_number() over(order by id,date) as 'r_id',
	id,
	st,
	date
from
	#t
)
,t4 as (
select
	row_number() over(order by id,date) as 'r_id',
	id,
	st,
	date 
from (
	select top 1
		id,
		st,
		date
	from
		t1 
	order by
		id,
		date
	union all
	select
		id,
		st,
		date
	from (
		select
			b.id,
			b.st,
			b.date
		from
			t1 a
		join
			t1 b 
				on a.r_id+1=b.r_id
		where 
			(a.id=b.id and a.st<>b.st)
		or	(a.id<>b.id)
	) z
) x
)
select
	a.id,
	a.st as 'sta',
	a.date as 'da',
	b.st as 'stb',
	b.date as 'db'
from
	t4 a
join
	t4 b
		on a.r_id+1=b.r_id
		and a.id=b.id
order by
	id,
	da

set statistics time off
set statistics io off

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-12 : 15:53:54

Try this one...

;WITH Yak (id, st, date, rowid, col)
AS (
	SELECT	id,
		st,
		date,
		ROW_NUMBER() OVER (ORDER BY id, date),
		0
	FROM	#t
)

SELECT		id,
		MAX(CASE WHEN col = 0 THEN st ELSE '' END) AS sta,
		MAX(CASE WHEN col = 0 THEN date ELSE '' END) AS da,
		MAX(CASE WHEN col = 1 THEN st ELSE '' END) AS stb,
		MAX(CASE WHEN col = 1 THEN date ELSE '' END) AS db
FROM		(
			select id, st, date, rowid, col from Yak
			union all
			select id, st, date, rowid - 1, col + 1 from Yak
		) AS d
group by	id,
		rowid
having		count(*) = 2
order by	rowid

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-12 : 16:04:56

Results for 1000 sample records:


derek 1 - triangular join

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t'. Scan count 3, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 141 ms,  elapsed time = 1518 ms.

derek 2  - cte's

Table '#t'. Scan count 16, logical reads 208, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 4, logical reads 702520, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 55094 ms,  elapsed time = 121561 ms.

peso

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 565 ms.

E 12°55'05.25"
N 56°04'39.16"

Starnamer
Starting Member

4 Posts

Posted - 2008-04-14 : 04:30:00

There was an error in my version 2. There should have been an additional "group by b.id, b.st, b.date" just be before the close of subquery 'z'. With this correction for 100 records I get...

Derek 1
(4108 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t'. Scan count 3, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 2330 ms.

Derek 2
(4108 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t'. Scan count 6, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 2499 ms.

Derek 3[(Similar to 2, but uses dense_rank() over(partition by id order by date))
(4108 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t'. Scan count 6, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 1736 ms.

peso
(4317 row(s) affected)
Table '#t'. Scan count 2, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 1534 ms.

Unfortunately, this seems slower than my corrected version 2 and also produces extra rows.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-14 : 04:38:19

Did you use same code to generate sample data?

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-14 : 04:46:49

Wow. Where did those values come from?
I used this code for testing

-- Peso
;WITH Yak (id, st, date, RowID)
AS (
	SELECT	id,
		st,
		date,
		ROW_NUMBER() OVER (ORDER BY id, date)
	FROM	Derek
)

SELECT		id,
		MAX(CASE WHEN col = 0 THEN st ELSE '' END) AS sta,
		MAX(CASE WHEN col = 0 THEN date ELSE '' END) AS da,
		MAX(CASE WHEN col = 1 THEN st ELSE '' END) AS stb,
		MAX(CASE WHEN col = 1 THEN date ELSE '' END) AS db
FROM		(
			SELECT id, st, date, RowID, 0 AS Col FROM Yak
			UNION ALL
			SELECT id, st, date, RowID - 1, 1 FROM Yak
		) AS d
GROUP BY	id,
		RowID
HAVING		COUNT(*) = 2
ORDER BY	RowID

-- Derek 1
select 
	id,
	sta,
	min(da) as 'da',
	stb,
	db 
from (
	select
		a.id,
		a.sta,
		a.da,
		b.st as 'stb',
		b.date as db
	from (
		select
			id,
			sta,
			da,
			min(db) as 'db'
		from (
			select 
				a.id,
				a.st as 'sta',
				a.date as 'da',
				b.st as 'stb',
				min(b.date) as 'db' 
			from
				Derek a
			join
				Derek b
					on a.id=b.id
					and a.date < b.date
					and a.st <> b.st
			group by
				a.id,
				a.st,
				a.date,
				b.st
		) x
		group by
			id,
			sta,
			da
	) a 
	join
		Derek b
			on a.id=b.id
			and a.db=b.date
) x
group by
	id,
	sta,
	stb,
	db
order by
	id,
	min(da)

-- Derek 3
;with t1 as (
select
	row_number() over(order by id,date) as 'r_id',
	id,
	st,
	date
from
	Derek
)
,t4 as (
select
	row_number() over(order by id,date) as 'r_id',
	id,
	st,
	date 
from (
	select top 1
		id,
		st,
		date
	from
		t1 
	order by
		id,
		date
	union all
	select
		id,
		st,
		date
	from (
		select
			b.id,
			b.st,
			b.date
		from
			t1 a
		join
			t1 b 
				on a.r_id+1=b.r_id
		where 
			(a.id=b.id and a.st<>b.st)
		or	(a.id<>b.id)
		group by b.id, b.st, b.date
	) z
) x
)
select
	a.id,
	a.st as 'sta',
	a.date as 'da',
	b.st as 'stb',
	b.date as 'db'
from
	t4 a
join
	t4 b
		on a.r_id+1=b.r_id
		and a.id=b.id
order by
	id,
	da

SET @n = 1000 -- Peso

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 140 ms,  elapsed time = 790 ms.

SET @n = 100 -- Peso

Table 'Derek'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 345 ms.

SET @n = 10 -- Peso

Table 'Derek'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 8 ms.

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-14 : 04:53:28

[code]SET @n = 10

Derek 1

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 3, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 24 ms.

Derek 3

Table 'Worktable'. Scan count 1, logical reads 359, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 6, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 42 ms.

SET @n = 100

Derek 1

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 3, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 462 ms.

Derek 3

Table 'Worktable'. Scan count 1, logical reads 7214, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 6, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 1131 ms.

SET @n = 1000

Derek 1

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Derek'. Scan count 3, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 1183 ms.

Derek 3

Table 'Derek'. Scan count 18, logical reads 156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 438 ms, elapsed time = 1321 ms.[/code]

E 12°55'05.25"
N 56°04'39.16"

Starnamer
Starting Member

4 Posts

Posted - 2008-04-14 : 07:37:26

quote:
Wow. Where did those values come from?

The timings are just those from my laptop for @n=1000. (Latitude D810 with 1.5Gb memory). I also added a Primary Key to the test data on columns (id,date).

I've now checked your query and the problem is that it doesn't eliminate those times when the status doesn't change. hence the different number of records.

Data:

id	st	date
19	F	2008-02-11 14:58:00.000
19	P	2008-02-11 21:17:00.000
19	P	2008-02-14 11:20:00.000
19	F	2008-02-15 02:11:00.000
19	P	2008-02-15 06:07:00.000
19	P	2008-02-16 05:27:00.000
19	F	2008-02-18 09:22:00.000
19	C	2008-02-19 05:50:00.000

My versions:

id	sta	da	stb	db
19	F	2008-02-11 14:58:00.000	P	2008-02-11 21:17:00.000
19	P	2008-02-11 21:17:00.000	F	2008-02-15 02:11:00.000
19	F	2008-02-15 02:11:00.000	P	2008-02-15 06:07:00.000
19	P	2008-02-15 06:07:00.000	F	2008-02-18 09:22:00.000
19	F	2008-02-18 09:22:00.000	C	2008-02-19 05:50:00.000

Peso

id	sta	da	stb	db
19	F	2008-02-11 14:58:00.000	P	2008-02-11 21:17:00.000
19	P	2008-02-11 21:17:00.000	P	2008-02-14 11:20:00.000
19	P	2008-02-14 11:20:00.000	F	2008-02-15 02:11:00.000
19	F	2008-02-15 02:11:00.000	P	2008-02-15 06:07:00.000
19	P	2008-02-15 06:07:00.000	P	2008-02-16 05:27:00.000
19	P	2008-02-16 05:27:00.000	F	2008-02-18 09:22:00.000
19	F	2008-02-18 09:22:00.000	C	2008-02-19 05:50:00.000

The requirement is that it only record pairs of statuses which are different.

Currently, it looks like using row_number() is fastest.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-14 : 08:52:46

What if you "cheat"?
Imagine you have a column named "Seq" and populate it's value like this

DECLARE	@Seq INT,
	@Yak INT,
	@st CHAR(1)

SET	@Seq = -1

UPDATE	Derek
SET	@Seq = @Seq + 1,
	@Yak = Seq =	CASE
				WHEN st = @st THEN NULL
				ELSE @Seq
			END,
	@Seq =	CASE
			WHEN @Yak IS NULL THEN @Seq - 1
			ELSE @Seq
		END,
	@st = st

Then you easily can write a query like this

SELECT		id,
		MAX(CASE WHEN Col = 0 THEN st ELSE '' END) AS sta,
		MAX(CASE WHEN Col = 0 THEN date ELSE '' END) AS da,
		MAX(CASE WHEN Col = 1 THEN st ELSE '' END) AS stb,
		MAX(CASE WHEN Col = 1 THEN date ELSE '' END) AS db
FROM		(
			SELECT	id,
				st,
				date,
				Seq,
				0 AS Col
			FROM	Derek
			WHERE	Seq IS NOT NULL

			UNION ALL

			SELECT	id,
				st,
				date,
				Seq - 1,
				1
			FROM	Derek
			WHERE	Seq IS NOT NULL
		) AS d
GROUP BY	id,
		Seq
HAVING		COUNT(*) = 2
ORDER BY	id,
		MAX(CASE WHEN Col = 0 THEN date ELSE '' END)

to get the result you want.

E 12°55'05.25"
N 56°04'39.16"

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources