SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Spotting breaks in data sequences
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

AskSQLTeam
Ask SQLTeam Question

USA
0 Posts

Posted - 07/16/2002 :  09:21:52  Show Profile  Visit AskSQLTeam's Homepage  Reply with Quote
neil submitted "A TSQL tutorial on spotting breaks in data sequences"

Article Link.

Page47
Flowing Fount of Yak Knowledge

USA
2878 Posts

Posted - 07/16/2002 :  09:33:07  Show Profile  Reply with Quote
In almost all cases the WHERE NOT EXISTS correlated sub-query will outperform the LEFT JOIN WHERE therightside IS NULL....

Run this in QA and Show Execution Plan...


create table testTable (i int)
go

insert testTable
select 1
union select 2
union select 3
union select 4
union select 6
union select 7
union select 10
union select 11
union select 15
union select 21
go

select 'there is a gap between '
+ convert(varchar, l.i)
+ ' and '
+ (select convert(varchar, min(i)) from testTable where i > l.i)
from testTable l
left join (select i from testTable ) r
on l.i + 1 = r.i
where r.i IS NULL
and l.i < (select max(i) from testTable)
order by l.i

select 'there is a gap between '
+ convert(varchar, l.i)
+ ' and '
+ (select convert(varchar, min(i)) from testTable where i > l.i)
from testTable l
where not exists (
select 47
from testTable r
where l.i + 1 = r.i )
and l.i < (select max(i) from testTable)
order by l.i

go
drop table testTable
go

 
SubTree cost of .266 vs. .196 . . .

[edit]Although, this is highly dependend on indexing. A clustered index on testTable(i) gives the LEFT JOIN query a .0578 and the NOT EXISTS query a .0747.[/edit]
<O>

Edited by - Page47 on 07/16/2002 10:09:01
Go to Top of Page

Arnold Fribble
Yak-finder General

United Kingdom
1961 Posts

Posted - 07/16/2002 :  13:13:08  Show Profile  Reply with Quote
If i is a clustered primary key (etc.) then (SQL Server 2000 certainly) is able to use a merge join between l and r, which is handy. This applies to both the LEFT JOIN and NOT EXISTS versions. Unfortunately, if you try r.i - l.i = 1 it is unable to rearrange this!

BTW, the number of gaps for unique values 1 .. n where each value has an independent probability of being present of p is n*p - n*p^2.


Go to Top of Page

jsmith8858
Dr. Cross Join

USA
7423 Posts

Posted - 11/08/2002 :  16:43:23  Show Profile  Visit jsmith8858's Homepage  Reply with Quote
OK, here's an alternate and shorter very quick way to spot gaps:


SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).

I think this takes care of all cases ..... Have fun!


Go to Top of Page

billsox
Yak Posting Veteran

74 Posts

Posted - 11/08/2002 :  22:30:07  Show Profile  Reply with Quote
jsmith8858, this code is golden! Can you explain exactly what it's doing?

Bill

quote:

OK, here's an alternate and shorter very quick way to spot gaps:


SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).

I think this takes care of all cases ..... Have fun!




Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 11/10/2002 :  18:01:38  Show Profile  Visit nr's Homepage  Reply with Quote
SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value < x.[value]
GROUP BY x.Value
HAVING Max(y.Value) <> x.value - 1


WHERE y.Value < x.[value]
get all y values that are less than the x in this group

HAVING Max(y.Value) <> x.value - 1
Exclude groups where there is a a y = x - 1 i.e. no gap

SELECT "Gap between " & x.Value & " and " & Max(y.value)
give the x and max(y) = greatest value less than the x where there is a gap.

same as (hope this works - removed some strange " and &)
SELECT 'Gap between ' + x.Value + ' and ' + y.value AS Message
FROM Vals AS y, Vals AS x
WHERE y.Value = (select max(z.Value) from Vals z where z.Value < x.Value)
and y.Value <> x.Value - 1


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

jsmith8858
Dr. Cross Join

USA
7423 Posts

Posted - 11/11/2002 :  15:04:49  Show Profile  Visit jsmith8858's Homepage  Reply with Quote
sorry, my SQL was kind of a mix of SQL server and Access.

NR, you are pretty right on about explaining how my query works. However, it is very different from your example for mostly 1 reason: your example needs to enumerate through the "Values" table 3 times, and mine only twice. If that table is really a slow view or huge table, the performance difference can be very important. Also, I am not sure what your example is doing, or why it needs the subquery in the WHERE clause. All of the logic components are provided just by doing a cartesian of the table to itself 1 time.

Another way of wording my query in english:

"For each value in the table (call this "x"), get the highest value ("y") less than x. If x <> (y + 1), there must be a gap between x and y."

Examples like this illustrate the big difference between WHERE and HAVING clauses which confused me for quite some time as I was beginning SQL.





Edited by - jsmith8858 on 11/15/2002 13:50:59
Go to Top of Page

BusmasterJones
Starting Member

4 Posts

Posted - 01/15/2004 :  11:42:30  Show Profile  Reply with Quote
I ran a comparision test between jsmith's HAVING version of the query and a slightly modified (to fix the datatype problem on concatenating the output) version of nr's MAX query. I ran these queries on a table with approx 14000 rows and 33 gaps. The column checked is an IDENTITY column functioning as the PK and has a nonclustered index on it. Here are the results:

nr query: 33 gaps found in < 1 sec execution time
jsmith query: 33 gaps found in 1 min 49 sec execution time

The query plans are radically different. The faster query really only uses the indexes to solve the problem. The slower query gets a cartesian product of the table being searched joined to itself (in this case 14000 x 14000 rows). Because of some filtering first due to the WHERE clause, it is actually more like 14000 x 7000 rows, but this still deals with almost 100,000,000 rows.

Clearly "small" differences in query construction can mean BIG differences in performance.
Go to Top of Page

jsmith8858
Dr. Cross Join

USA
7423 Posts

Posted - 01/15/2004 :  12:36:21  Show Profile  Visit jsmith8858's Homepage  Reply with Quote
you know, i think i wrote that one before i fully grasped the concepts of subqueries in SQL Server -- i was used to doing things more of the old Access way.

And, of course, as time goes on we get much wiser -- when I revisit this problem, I then realize that this might be the most efficient way:

select v1,v2 from
(
SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2
from vals x
) a
where v1 <> v2+1

which is a little bit faster than NR's method, and requires a little less work.

And MUCH MUCH more efficient than my old way of doing it !!!!

Great job, Busmaster, thanks for the info and doing the actual test. FYI -- here's how I tested:


create table vals(i int primary key, d varchar(10))

GO
declare @i int;
set @i=0
set nocount on
while @i < 14000
	begin
	if @i % 424 <> 0 insert into vals(i, d) values(@i, 'dummy data')
	set @i = @i+1
	end
set nocount off
GO

-- my "new" way:

select v1,v2 from
(
SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2
from vals x
) a
where v1 <> v2+1

-- NR's method:

SELECT x.i, y.i
FROM Vals y, Vals x 
WHERE y.i = (select max(z.i) from Vals z where z.i < x.i)
and y.i <> x.i - 1

GO

drop table vals


- Jeff

Edited by - jsmith8858 on 01/15/2004 12:37:26
Go to Top of Page

CactusJuice
Starting Member

USA
46 Posts

Posted - 04/15/2005 :  12:25:39  Show Profile  Visit CactusJuice's Homepage  Reply with Quote
What happened to the original article? I'd like to see it. Get a 404 error and Google's cache is empty so I assume it's been missing for a while.
Go to Top of Page

Eejit
Starting Member

1 Posts

Posted - 07/14/2005 :  10:33:48  Show Profile  Reply with Quote
Ditto Cactus Juice! Cannot find original article, which seems to cover the exact problem I've been struggling with all day. Googled it, tried different variations on DNS, etc. Any chance someone could fix the broken link please?
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 07/18/2005 :  15:56:08  Show Profile  Visit nr's Homepage  Reply with Quote
Don't know what the article said but have a look at
http://www.mindsdoor.net/SQLTsql/FindGapsInSequence.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.08 seconds. Powered By: Snitz Forums 2000