Author 
Topic 
AskSQLTeam
Ask SQLTeam Question
0 Posts 
Posted  20020716 : 09:21:52

neil submitted "A TSQL tutorial on spotting breaks in data sequences" Article Link. 

Page47
Master Smack Fu Yak Hacker
2878 Posts 
Posted  20020716 : 09:33:07

In almost all cases the WHERE NOT EXISTS correlated subquery will outperform the LEFT JOIN WHERE therightside IS NULL....
Run this in QA and Show Execution Plan...
create table testTable (i int) go
insert testTable select 1 union select 2 union select 3 union select 4 union select 6 union select 7 union select 10 union select 11 union select 15 union select 21 go
select 'there is a gap between ' + convert(varchar, l.i) + ' and ' + (select convert(varchar, min(i)) from testTable where i > l.i) from testTable l left join (select i from testTable ) r on l.i + 1 = r.i where r.i IS NULL and l.i < (select max(i) from testTable) order by l.i
select 'there is a gap between ' + convert(varchar, l.i) + ' and ' + (select convert(varchar, min(i)) from testTable where i > l.i) from testTable l where not exists ( select 47 from testTable r where l.i + 1 = r.i ) and l.i < (select max(i) from testTable) order by l.i
go drop table testTable go
SubTree cost of .266 vs. .196 . . .
[edit]Although, this is highly dependend on indexing. A clustered index on testTable(i) gives the LEFT JOIN query a .0578 and the NOT EXISTS query a .0747.[/edit] <O>
Edited by  Page47 on 07/16/2002 10:09:01 


Arnold Fribble
Yakfinder General
1961 Posts 
Posted  20020716 : 13:13:08

If i is a clustered primary key (etc.) then (SQL Server 2000 certainly) is able to use a merge join between l and r, which is handy. This applies to both the LEFT JOIN and NOT EXISTS versions. Unfortunately, if you try r.i  l.i = 1 it is unable to rearrange this!
BTW, the number of gaps for unique values 1 .. n where each value has an independent probability of being present of p is n*p  n*p^2.



jsmith8858
Dr. Cross Join
7423 Posts 
Posted  20021108 : 16:43:23

OK, here's an alternate and shorter very quick way to spot gaps:
SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message FROM Vals AS y, Vals AS x WHERE y.Value < x.[value] GROUP BY x.Value HAVING Max(y.Value) <> x.value  1
If you are using dates, just edit the "x.value  1" as needed. For example, DateAdd(Day,1,x.value).
I think this takes care of all cases ..... Have fun!



billsox
Yak Posting Veteran
74 Posts 
Posted  20021108 : 22:30:07

jsmith8858, this code is golden! Can you explain exactly what it's doing?
Bill
quote:
OK, here's an alternate and shorter very quick way to spot gaps:
SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message FROM Vals AS y, Vals AS x WHERE y.Value < x.[value] GROUP BY x.Value HAVING Max(y.Value) <> x.value  1
If you are using dates, just edit the "x.value  1" as needed. For example, DateAdd(Day,1,x.value).
I think this takes care of all cases ..... Have fun!



nr
SQLTeam MVY
12543 Posts 
Posted  20021110 : 18:01:38

SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message FROM Vals AS y, Vals AS x WHERE y.Value < x.[value] GROUP BY x.Value HAVING Max(y.Value) <> x.value  1
WHERE y.Value < x.[value] get all y values that are less than the x in this group
HAVING Max(y.Value) <> x.value  1 Exclude groups where there is a a y = x  1 i.e. no gap
SELECT "Gap between " & x.Value & " and " & Max(y.value) give the x and max(y) = greatest value less than the x where there is a gap.
same as (hope this works  removed some strange " and &) SELECT 'Gap between ' + x.Value + ' and ' + y.value AS Message FROM Vals AS y, Vals AS x WHERE y.Value = (select max(z.Value) from Vals z where z.Value < x.Value) and y.Value <> x.Value  1
========================================== Cursors are useful if you don't know sql. DTS can be used in a similar way. Beer is not cold and it isn't fizzy. 


jsmith8858
Dr. Cross Join
7423 Posts 
Posted  20021111 : 15:04:49

sorry, my SQL was kind of a mix of SQL server and Access.
NR, you are pretty right on about explaining how my query works. However, it is very different from your example for mostly 1 reason: your example needs to enumerate through the "Values" table 3 times, and mine only twice. If that table is really a slow view or huge table, the performance difference can be very important. Also, I am not sure what your example is doing, or why it needs the subquery in the WHERE clause. All of the logic components are provided just by doing a cartesian of the table to itself 1 time.
Another way of wording my query in english:
"For each value in the table (call this "x"), get the highest value ("y") less than x. If x <> (y + 1), there must be a gap between x and y."
Examples like this illustrate the big difference between WHERE and HAVING clauses which confused me for quite some time as I was beginning SQL.
Edited by  jsmith8858 on 11/15/2002 13:50:59 


BusmasterJones
Starting Member
4 Posts 
Posted  20040115 : 11:42:30

I ran a comparision test between jsmith's HAVING version of the query and a slightly modified (to fix the datatype problem on concatenating the output) version of nr's MAX query. I ran these queries on a table with approx 14000 rows and 33 gaps. The column checked is an IDENTITY column functioning as the PK and has a nonclustered index on it. Here are the results:
nr query: 33 gaps found in < 1 sec execution time jsmith query: 33 gaps found in 1 min 49 sec execution time
The query plans are radically different. The faster query really only uses the indexes to solve the problem. The slower query gets a cartesian product of the table being searched joined to itself (in this case 14000 x 14000 rows). Because of some filtering first due to the WHERE clause, it is actually more like 14000 x 7000 rows, but this still deals with almost 100,000,000 rows.
Clearly "small" differences in query construction can mean BIG differences in performance. 


jsmith8858
Dr. Cross Join
7423 Posts 
Posted  20040115 : 12:36:21

you know, i think i wrote that one before i fully grasped the concepts of subqueries in SQL Server  i was used to doing things more of the old Access way.
And, of course, as time goes on we get much wiser  when I revisit this problem, I then realize that this might be the most efficient way:
select v1,v2 from ( SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2 from vals x ) a where v1 <> v2+1
which is a little bit faster than NR's method, and requires a little less work.
And MUCH MUCH more efficient than my old way of doing it !!!!
Great job, Busmaster, thanks for the info and doing the actual test. FYI  here's how I tested:
create table vals(i int primary key, d varchar(10))
GO declare @i int; set @i=0 set nocount on while @i < 14000 begin if @i % 424 <> 0 insert into vals(i, d) values(@i, 'dummy data') set @i = @i+1 end set nocount off GO
 my "new" way:
select v1,v2 from ( SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2 from vals x ) a where v1 <> v2+1
 NR's method:
SELECT x.i, y.i FROM Vals y, Vals x WHERE y.i = (select max(z.i) from Vals z where z.i < x.i) and y.i <> x.i  1
GO
drop table vals
 Jeff 


CactusJuice
Starting Member
46 Posts 
Posted  20050415 : 12:25:39

What happened to the original article? I'd like to see it. Get a 404 error and Google's cache is empty so I assume it's been missing for a while. 


Eejit
Starting Member
1 Post 
Posted  20050714 : 10:33:48

Ditto Cactus Juice! Cannot find original article, which seems to cover the exact problem I've been struggling with all day. Googled it, tried different variations on DNS, etc. Any chance someone could fix the broken link please? 


nr
SQLTeam MVY
12543 Posts 
Posted  20050718 : 15:56:08

Don't know what the article said but have a look at http://www.mindsdoor.net/SQLTsql/FindGapsInSequence.html
========================================== Cursors are useful if you don't know sql. DTS can be used in a similar way. Beer is not cold and it isn't fizzy. 


