| Author |
Topic  |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/09/2011 : 18:05:54
|
Who is ready to offer a most efficient approach for matching next rows with current rows (based on a sequential value with gaps in them)? Is following query most efficient or you have a better query?
WITH C AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY date ASC) rn
FROM TableName
)
SELECT *
FROM C A
JOIN C B
ON A.rn + 1 = B.rn;
______________________
|
Edited by - ms65g on 04/09/2011 18:46:57
|
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/10/2011 : 15:24:37
|
It depends what the query is for. Weighted median? Other purpose?
N 56°04'39.26" E 12°55'05.63" |
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/10/2011 : 16:47:26
|
Can we suppose just the result is important for us, no dive to a specific problem? So if you ready, it’s the table with some sample data and the desired result:
CREATE TABLE [Sample]
(
seq_nbr INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO [Sample]
VALUES (1), (5), (7), (8), (15), (50);
SELECT seq_nbr
FROM [Sample]
ORDER BY seq_nbr ASC;
/*====== Sample Data =========
seq_nbr
-----------
1
5
7
8
15
50
============ Wanted Result ==========
seq_nbr seq_nbr
----------- -----------
1 5
5 7
7 8
8 15
15 50
50 NULL
*/
And my first attempt here:
WITH C(seq_nbr, rnk) AS
(
SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC)
FROM [Sample]
)
SELECT A.seq_nbr, B.seq_nbr
FROM C AS A
LEFT JOIN C AS B
ON A.rnk + 1 = B.rnk;
______________________
|
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/11/2011 : 03:45:48
|
As I wrote before. It all depends on the usage of the data. Anyway, here is another approach from the top of my head that has less reads.
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- Ms65g
;WITH C(seq_nbr, rnk) AS
(
SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC)
FROM [Sample]
)
SELECT A.seq_nbr, B.seq_nbr
FROM C AS A
LEFT JOIN C AS B
ON A.rnk + 1 = B.rnk;
-- Peso
SELECT s.seq_nbr,
f.seq_nbr
FROM dbo.[Sample] AS s
OUTER APPLY (
SELECT TOP(1) w.seq_nbr
FROM dbo.[Sample] AS w
WHERE w.seq_nbr > s.seq_nbr
ORDER BY w.seq_nbr
) AS f
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
N 56°04'39.26" E 12°55'05.63" |
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/11/2011 : 07:08:58
|
Thanks for posting, For starting is not bad. According to the first post of this topic (http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=100121) you suggested an approach that has less read. I reformed your approach for compatibility with the problem (also I try to write the code in your favorite style, but for tabbing maybe you do not see aligned characters). And you will see that my next solution has less reads than yours!
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- Ms65g 1
;WITH C(seq_nbr, rnk) AS
(
SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC)
FROM [Sample]
)
SELECT A.seq_nbr, B.seq_nbr
FROM C AS A
LEFT JOIN C AS B
ON A.rnk + 1 = B.rnk;
-- Peso 1
SELECT s.seq_nbr,
f.seq_nbr
FROM dbo.[Sample] AS s
OUTER APPLY (
SELECT TOP(1) w.seq_nbr
FROM dbo.[Sample] AS w
WHERE w.seq_nbr > s.seq_nbr
ORDER BY w.seq_nbr
) AS f;
-- Peso 2
WITH C(seq_nbr, rnk) AS
(
SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr)
FROM dbo.[Sample]
)
SELECT MAX(CASE WHEN flag = 1 THEN seq_nbr END) AS seq_nbr,
MAX(CASE WHEN flag = 2 THEN seq_nbr END) AS seq_nbr
FROM (
SELECT seq_nbr,
rnk + 1 AS grp_fct,
1 AS flag
FROM C
UNION ALL
SELECT seq_nbr,
rnk,
2
FROM C
) AS D
GROUP BY grp_fct
HAVING MAX(CASE WHEN flag = 1 THEN seq_nbr END) IS NOT NULL;
--Ms65g 2
;WITH C AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC) AS rownum
FROM [Sample]
)
SELECT MAX(CASE WHEN k = 1 THEN seq_nbr END) AS seq_nbr,
MAX(CASE WHEN k = 2 THEN seq_nbr END) AS seq_nbr
FROM C
CROSS JOIN
(VALUES (1), (2)
) AS D(k)
GROUP BY (CASE k WHEN 1 THEN rownum + 1
WHEN 2 THEN rownum
END)
HAVING MAX(CASE WHEN k = 1 THEN seq_nbr END) IS NOT NULL;
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
/*
====== Ms65g 1
Table 'Sample'. Scan count 2, logical reads 15
====== Peso 1
Table 'Sample'. Scan count 7, logical reads 14
====== Peso 2
Table 'Sample'. Scan count 2, logical reads 4
====== Ms65g
Table 'Sample'. Scan count 1, logical reads 2
*/
As you see my second solution is most efficient and needs half of your best number of reads.
______________________
|
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/11/2011 : 08:50:21
|
I do not like that you take my suggestions from elsewhere and put them here under my name when the original solutions were meant to be used in another scenario.
N 56°04'39.26" E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/11/2011 : 08:53:35
|
quote: Originally posted by ms65g
As you see my second solution is most efficient and needs half of your best number of reads.
Yes, but your suggestion only handles seq_nbr column only. My suggestion handles ALL columns, IF they are going to used further.
As I have told you, the solution varies for the purpose of the data.
N 56°04'39.26" E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
Sweden
29138 Posts |
Posted - 04/11/2011 : 09:06:19
|
This is my own "Peso 2";WITH cte
AS (
SELECT seq_nbr,
d,
ROW_NUMBER() OVER (ORDER BY seq_nbr) / 2 AS theGrp
FROM [Sample]
CROSS JOIN (
VALUES (0),
(1)
) AS x(d)
)
SELECT MIN(seq_nbr) AS col1,
MAX(seq_nbr) AS col2
FROM cte
GROUP BY theGrp
HAVING MAX(d) = 1 And here is the statistics between your #2 and my #2 when using more sample data, or example 2514 records.
-- Peso 2 (my own version)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.
Table 'Sample'. Scan count 1, logical reads 7, physical reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 53 ms.
-- Ms65g #2
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.
Table 'Sample'. Scan count 1, logical reads 14, physical reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 69 ms.
N 56°04'39.26" E 12°55'05.63" |
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/11/2011 : 12:17:50
|
>> I do not like that you take my suggestions from elsewhere and put them here under my name when the original solutions were meant to be used in another scenario. << I only wanted to show a better solution from you.
Here’s my new version of Ms65g 2: I improve the query by some changes.
;WITH C1 AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC) AS rownum
FROM [Sample]
), C2 AS
(SELECT (CASE k WHEN 1 THEN rownum + 1
WHEN 2 THEN rownum
END) grp_fct, seq_nbr
FROM C1
CROSS JOIN
(VALUES (1), (2)
) AS D(k)
)
SELECT MIN(seq_nbr) AS seq_nbr,
MAX(seq_nbr) AS seq_nbr
FROM C2
GROUP BY grp_fct
HAVING grp_fct > 1;
______________________
|
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/11/2011 : 12:37:35
|
And here is the statistics between your #2 and my #2 when using more sample data, or example 4956 records.
Peso 2#
Table 'Sample'. Scan count 1, logical reads 10
Query Cost: 83%
Ms65g 2#
Table 'Sample'. Scan count 1, logical reads 10
Query Cost: 17%
______________________
|
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 04/11/2011 : 13:41:03
|
Also a shorter version of my second solution:
;WITH C AS
(
SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct
FROM [Sample]
CROSS JOIN
(VALUES (0), (1)
) AS D(k)
)
SELECT MIN(seq_nbr) AS seq_nbr,
MAX(seq_nbr) AS seq_nbr
FROM C
GROUP BY grp_fct
HAVING grp_fct > 1;
______________________
|
 |
|
| |
Topic  |
|