Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Site Related Forums
 Article Discussion
 Article: Spotting breaks in data sequences

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-07-16 : 09:21:52
neil submitted "A TSQL tutorial on spotting breaks in data sequences"

Article Link.

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-07-16 : 09:33:07
In almost all cases the WHERE NOT EXISTS correlated sub-query will outperform the LEFT JOIN WHERE therightside IS NULL....

Run this in QA and Show Execution Plan...


create table testTable (i int)
go

insert testTable
select 1
union select 2
union select 3
union select 4
union select 6
union select 7
union select 10
union select 11
union select 15
union select 21
go

select 'there is a gap between '
+ convert(varchar, l.i)
+ ' and '
+ (select convert(varchar, min(i)) from testTable where i > l.i)
from testTable l
left join (select i from testTable ) r
on l.i + 1 = r.i
where r.i IS NULL
and l.i < (select max(i) from testTable)
order by l.i

select 'there is a gap between '
+ convert(varchar, l.i)
+ ' and '
+ (select convert(varchar, min(i)) from testTable where i > l.i)
from testTable l
where not exists (
select 47
from testTable r
where l.i + 1 = r.i )
and l.i < (select max(i) from testTable)
order by l.i

go
drop table testTable
go

 
SubTree cost of .266 vs. .196 . . .

[edit]Although, this is highly dependend on indexing. A clustered index on testTable(i) gives the LEFT JOIN query a .0578 and the NOT EXISTS query a .0747.[/edit]
<O>

Edited by - Page47 on 07/16/2002 10:09:01
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-07-16 : 13:13:08
If i is a clustered primary key (etc.) then (SQL Server 2000 certainly) is able to use a merge join between l and r, which is handy. This applies to both the LEFT JOIN and NOT EXISTS versions. Unfortunately, if you try r.i - l.i = 1 it is unable to rearrange this!

BTW, the number of gaps for unique values 1 .. n where each value has an independent probability of being present of p is n*p - n*p^2.


Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-11-08 : 16:43:23
OK, here's an alternate and shorter very quick way to spot gaps:


SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).

I think this takes care of all cases ..... Have fun!


Go to Top of Page

billsox
Yak Posting Veteran

74 Posts

Posted - 2002-11-08 : 22:30:07
jsmith8858, this code is golden! Can you explain exactly what it's doing?

Bill

quote:

OK, here's an alternate and shorter very quick way to spot gaps:


SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).

I think this takes care of all cases ..... Have fun!




Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-11-10 : 18:01:38
SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


WHERE y.Value < x.[value]
get all y values that are less than the x in this group

HAVING Max(y.Value) <> x.value - 1
Exclude groups where there is a a y = x - 1 i.e. no gap

SELECT "Gap between " & x.Value & " and " & Max(y.value)
give the x and max(y) = greatest value less than the x where there is a gap.

same as (hope this works - removed some strange " and &)
SELECT 'Gap between ' + x.Value + ' and ' + y.value AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value = (select max(z.Value) from Vals z where z.Value < x.Value)
and y.Value <> x.Value - 1


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-11-11 : 15:04:49
sorry, my SQL was kind of a mix of SQL server and Access.

NR, you are pretty right on about explaining how my query works. However, it is very different from your example for mostly 1 reason: your example needs to enumerate through the "Values" table 3 times, and mine only twice. If that table is really a slow view or huge table, the performance difference can be very important. Also, I am not sure what your example is doing, or why it needs the subquery in the WHERE clause. All of the logic components are provided just by doing a cartesian of the table to itself 1 time.

Another way of wording my query in english:

"For each value in the table (call this "x"), get the highest value ("y") less than x. If x <> (y + 1), there must be a gap between x and y."

Examples like this illustrate the big difference between WHERE and HAVING clauses which confused me for quite some time as I was beginning SQL.





Edited by - jsmith8858 on 11/15/2002 13:50:59
Go to Top of Page

BusmasterJones
Starting Member

4 Posts

Posted - 2004-01-15 : 11:42:30
I ran a comparision test between jsmith's HAVING version of the query and a slightly modified (to fix the datatype problem on concatenating the output) version of nr's MAX query. I ran these queries on a table with approx 14000 rows and 33 gaps. The column checked is an IDENTITY column functioning as the PK and has a nonclustered index on it. Here are the results:

nr query: 33 gaps found in < 1 sec execution time
jsmith query: 33 gaps found in 1 min 49 sec execution time

The query plans are radically different. The faster query really only uses the indexes to solve the problem. The slower query gets a cartesian product of the table being searched joined to itself (in this case 14000 x 14000 rows). Because of some filtering first due to the WHERE clause, it is actually more like 14000 x 7000 rows, but this still deals with almost 100,000,000 rows.

Clearly "small" differences in query construction can mean BIG differences in performance.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-01-15 : 12:36:21
you know, i think i wrote that one before i fully grasped the concepts of subqueries in SQL Server -- i was used to doing things more of the old Access way.

And, of course, as time goes on we get much wiser -- when I revisit this problem, I then realize that this might be the most efficient way:

select v1,v2 from
(
SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2
from vals x
) a
where v1 <> v2+1

which is a little bit faster than NR's method, and requires a little less work.

And MUCH MUCH more efficient than my old way of doing it !!!!

Great job, Busmaster, thanks for the info and doing the actual test. FYI -- here's how I tested:


create table vals(i int primary key, d varchar(10))

GO
declare @i int;
set @i=0
set nocount on
while @i < 14000
begin
if @i % 424 <> 0 insert into vals(i, d) values(@i, 'dummy data')
set @i = @i+1
end
set nocount off
GO

-- my "new" way:

select v1,v2 from
(
SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2
from vals x
) a
where v1 <> v2+1

-- NR's method:

SELECT x.i, y.i
FROM Vals y, Vals x
WHERE y.i = (select max(z.i) from Vals z where z.i < x.i)
and y.i <> x.i - 1

GO

drop table vals


- Jeff
Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2005-04-15 : 12:25:39
What happened to the original article? I'd like to see it. Get a 404 error and Google's cache is empty so I assume it's been missing for a while.
Go to Top of Page

Eejit
Starting Member

1 Post

Posted - 2005-07-14 : 10:33:48
Ditto Cactus Juice! Cannot find original article, which seems to cover the exact problem I've been struggling with all day. Googled it, tried different variations on DNS, etc. Any chance someone could fix the broken link please?
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2005-07-18 : 15:56:08
Don't know what the article said but have a look at
http://www.mindsdoor.net/SQLTsql/FindGapsInSequence.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -