Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 How to identify number series

Author  Topic 

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-21 : 10:06:23
I have a sql query problem that I haven't been able to solve myself for a while now so I'm hoping for some help *experts only please* (for those not familiar with our old friends; that was a joke...any suggestions are more than welcome )

What I basically need to do is to identify a) phone numbers that are a part of a series and b) phone numbers that are not a part of a series, from a table of land line phone numbers. A series is defined as phone numbers in sequence that has 10 or more consecutive numbers in a row. Somewhat simple requirements, but the implementation wasn't equally simple (at least not to me). Here's what I got so far:
CREATE TABLE numbers (
Number bigint NOT NULL PRIMARY KEY CLUSTERED
)

INSERT INTO numbers
SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL
SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL
SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL
SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL
SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL
SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL
SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL
SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL
SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL
SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL
SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL
SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743

WITH cte(ID, Number) AS (
SELECT
RowNumber = ROW_NUMBER() OVER (ORDER BY number),
number
FROM numbers
)

SELECT
x.ID,
x.Number,
SeriesRowNum = ROW_NUMBER() OVER (PARTITION BY Consecutive ORDER BY x.ID),
IsSeries = 'CASE WHEN something THEN 1 ELSE 0 END'
FROM
(SELECT
a.ID,
a.Number,
Consecutive = CASE WHEN b.Number - a.Number = 1 THEN 1 ELSE 0 END
FROM cte a
INNER JOIN cte b
ON a.ID = (b.ID-1)
) as x
ORDER BY x.ID
I have no idea if this is the right/best approach but it has gotten me to a point where I'm at least able to find consecutive numbers. The problem I have now is to get the SeriesRowNum to start at 1 again once it finds a number that's not consecutive and then to find a way to create an IsSeries column that says if a number is in a series or not.

Any takers??

- Lumbago
http://xkcd.com/327/

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-21 : 10:15:19
The ultimate output of this query would actually be something like this where each series has a unique synthetic ID attached to it:
ID   Number       SeriesID
1 4488726600 0
2 4488726695 0
3 4488726696 0
4 4488726697 0
5 4488726698 0
6 4488726701 1
7 4488726702 1
8 4488726703 1
9 4488726704 1
10 4488726705 1
11 4488726706 1
12 4488726707 1
13 4488726708 1
14 4488726709 1
15 4488726710 1
16 4488726715 0
...
--> and then the next series would have the ID 2
But that would really be far out...I'm happy with just about anything at this point. And performance is not an issue, this is a one time deal...but a few 100k numbers though.

- Lumbago
http://xkcd.com/327/
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2009-10-21 : 10:21:40
quote:
Need a regular table since table variables are not supported in CTE's

Table variable does support in CTE


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-21 : 10:24:24
>>Table variable does support in CTE

I couldn't get it to work but that's not really relevant. I'll remove the remark...

- Lumbago
http://xkcd.com/327/
Go to Top of Page

Ifor
Aged Yak Warrior

700 Posts

Posted - 2009-10-21 : 10:43:38
You could try the row difference technique.
(This won some competition but I cannot remember which one.)


;WITH cte
AS
(
SELECT
Number
,ROW_NUMBER() OVER (ORDER BY Number) As RowNum
FROM Numbers
)
,cte2
AS
(
SELECT *
,DENSE_RANK() OVER (ORDER BY Number - RowNum) As Series
FROM cte
)
,cte3
AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY Series) AS SCount
FROM cte2

)
SELECT Number
,CASE
WHEN SCount = 1
THEN 0
ELSE 1
END AS InSeries
FROM cte3
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2009-10-21 : 11:44:40
why not seek 2 numbers which are 10 apart and who have 9/10 numbers in between.

select a.id, b.id, count(*) from numbers a
inner join numbers b on a.id = b.id + 10
inner join numbers c on a.id < c.id and c.id < b.10
group by a.id, b.id
having count(*) >= 10

is this approach workable?
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-10-21 : 13:21:23
quote:
Originally posted by Ifor

You could try the row difference technique.
(This won some competition but I cannot remember which one.)

<snip>
That is probably the fastest way that I know of. If it only a one time deal, it should be fine, but you might want to perf test it.

small correction:
SELECT Number
,CASE
WHEN SCount < 10
THEN 0
ELSE 1
END AS InSeries
FROM cte3
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2009-10-21 : 22:05:32
[code]
DECLARE @Numbers TABLE
(
Number bigint NOT NULL PRIMARY KEY CLUSTERED
)

INSERT INTO @Numbers
SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL
SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL
SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL
SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL
SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL
SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL
SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL
SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL
SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL
SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL
SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL
SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743

; WITH cte(RowNumber, Number)
AS
(
SELECT RowNumber = ROW_Number() OVER (ORDER BY Number),
Number
FROM @Numbers
),
rcte
AS
(
SELECT c1.RowNumber, c1.Number, Series = 1
FROM cte c1
WHERE c1.RowNumber = 1

UNION ALL

SELECT c.RowNumber, c.Number,
Series = CASE WHEN r.Number = c.Number - 1 THEN r.Series ELSE r.Series + 1 END
FROM rcte r
INNER JOIN cte c ON r.RowNumber = c.RowNumber - 1
)
SELECT RowNumber, Number, Series, SeriesNo,
SeriesID = CASE WHEN SeriesNo = 0 THEN 0 ELSE dense_rank() OVER (ORDER BY SeriesNo) - 1 END
FROM
(
SELECT RowNumber, Number,
Series,
SeriesNo = CASE WHEN COUNT(*) OVER (PARTITION BY Series) < 10 THEN 0 ELSE Series END
FROM rcte
) s
ORDER BY RowNumber

/*
RowNumber Number Series SeriesNo SeriesID
-------------------- -------------------- ----------- ----------- --------------------
1 4488726600 1 0 0
2 4488726695 2 0 0
3 4488726696 2 0 0
4 4488726697 2 0 0
5 4488726698 2 0 0
6 4488726701 3 3 1
7 4488726702 3 3 1
8 4488726703 3 3 1
9 4488726704 3 3 1
10 4488726705 3 3 1
11 4488726706 3 3 1
12 4488726707 3 3 1
13 4488726708 3 3 1
14 4488726709 3 3 1
15 4488726710 3 3 1
16 4488726715 4 0 0
17 4488726717 5 5 2
18 4488726718 5 5 2
19 4488726719 5 5 2
20 4488726720 5 5 2
21 4488726721 5 5 2
22 4488726722 5 5 2
23 4488726723 5 5 2
24 4488726724 5 5 2
25 4488726725 5 5 2
26 4488726726 5 5 2
27 4488726727 5 5 2
28 4488726728 5 5 2
29 4488726729 5 5 2
30 4488726730 5 5 2
31 4488726731 5 5 2
32 4488726732 5 5 2
33 4488726733 5 5 2
34 4488726734 5 5 2
35 4488726739 6 0 0
36 4488726743 7 0 0

(36 row(s) affected)
*/
[/code]


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 03:30:45
Now this is why I love SQLTeam and keep coming back all the time...the knowledge and help you get from the people here is just mindblowing!! It's gonna take me a while to go through each of the suggestions and try to understand and adapt them, but I have to say that Khtans solution looks *really* promising

- Lumbago
http://xkcd.com/327/
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 03:59:43
Ok, since khtans solution looked the most promising I checked that one first and it worked *perfectly* except for one thing which might mess everything up; when I run it against my real table I got the following error:

Msg 530, Level 16, State 1, Line 18
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.


- Lumbago
http://xkcd.com/327/
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2009-10-22 : 04:06:48
set the MAXRECURSION to 0

add this to the last line
OPTION (MAXRECURSION 0)


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 04:25:12
Well...there is one new level of recursion for every number in the table which means that I will have a few 100k levels of recursion when I run this on my real data. I'm suspecting that the server will not enjoy this very much...?

- Lumbago
http://xkcd.com/327/
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2009-10-22 : 04:35:16
Do you have a test server ? Let's see how does the server takes it


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 05:06:56
Hehe...yes I do have a test server and I ran it on 5000 numbers. The query took 2 mins 32 secs and the server didn't go completely haywire but pretty much. All 4 CPU cores were running on 60-70% for the duration of the query but as I said previously performance really isn't an issue here since this a one-time deal and I can run it at off-peak hours. I complies 100% to the requirements and for that I am forever grateful.

*I owe you bigtime!!* (hmm...I'm getting in debt to quite a few people here )

- Lumbago
http://xkcd.com/327/
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2009-10-22 : 05:12:10
If you worried about the recursion, try this Quirky update technique

DECLARE @Numbers TABLE
(
Number bigint NOT NULL PRIMARY KEY CLUSTERED,
sno int
)

INSERT INTO @Numbers (number)
SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL
SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL
SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL
SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL
SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL
SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL
SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL
SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL
SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL
SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL
SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL
SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743

declare @sno int,@number bigint
select @sno=0, @number=0

update @Numbers
set sno=@sno,
@sno=case when @number-number=-1 then 1 else 0 end,@number=number

select @sno=0, @number=0

update @Numbers
set sno=@sno,
@sno=case when @number-number=-1 then @sno else @sno+1 end,@number=number

select number,dense_rank() over (order by case when series_no<10 then 0 else series_no end)-1 as series_no from
(
select *,count(*) over (partition by sno) as series_no from @numbers
) as t
order by number


Find more here about Quirky update technique
http://sqlblogcasts.com/blogs/madhivanan/archive/2009/06/10/quirky-update-in-sql-server.aspx

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2009-10-22 : 06:44:21
Dont need two updations

DECLARE @Numbers TABLE
(
Number bigint NOT NULL PRIMARY KEY CLUSTERED,
sno int
)

INSERT INTO @Numbers (number)
SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL
SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL
SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL
SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL
SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL
SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL
SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL
SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL
SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL
SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL
SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL
SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743

declare @sno int,@number bigint
select @sno=0, @number=0


update @Numbers
set sno=@sno,
@sno=case when @number-number=-1 then @sno else @sno+1 end,@number=number

select number,dense_rank() over (order by case when series_no<10 then 0 else series_no end)-1 as series_no from
(
select *,count(*) over (partition by sno) as series_no from @numbers
) as t
order by number


Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 07:52:12
I've done quite a bit of testing now and performance-wise it's a complete no-brainer: working through my 237k numbers takes 12secs using madhis script and khtans script takes 2.5 min for 5000 numbers. However madhis version is not returning the right results...the final query reports the same series_no for several of the series even though they are not connected i any way. I have tried but I haven't been able to figure out why yet.

- Lumbago
http://xkcd.com/327/
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-10-22 : 08:29:36
I've fiddled a little with madhis version now and I have to say that this Quirky update method is really cool!! I had to do a slight change (in red) an now it actually seems to be working 100%...I'm simply amazed:
...
select
number,
series_id = dense_rank() over (order by case when series_count < 10 then 0 else sno end)-1
from
(select *, series_count = count(*) over (partition by sno) from @numbers) as t
order by number
I'm still trying to figure out what's going on though so this might not be the end of it but I have to say that it looks extremely promising! 237k numbers processed in 12 seconds _including_ transferring all the data to my client :)

- Lumbago
http://xkcd.com/327/
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2009-10-22 : 08:52:00
quote:
Originally posted by Lumbago

I've fiddled a little with madhis version now and I have to say that this Quirky update method is really cool!! I had to do a slight change (in red) an now it actually seems to be working 100%...I'm simply amazed:
...
select
number,
series_id = dense_rank() over (order by case when series_count < 10 then 0 else sno end)-1
from
(select *, series_count = count(*) over (partition by sno) from @numbers) as t
order by number
I'm still trying to figure out what's going on though so this might not be the end of it but I have to say that it looks extremely promising! 237k numbers processed in 12 seconds _including_ transferring all the data to my client :)

- Lumbago
http://xkcd.com/327/


Thanks for your feedback

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-10-22 : 11:59:29
Lumbago,

Did you happen to test Ifors method? I'm not sure about execution time, but on a small sample data set it seems to about twice as effecient as the Quirky method. Just curious about the performance in the "real world." :)
Go to Top of Page

Ifor
Aged Yak Warrior

700 Posts

Posted - 2009-10-22 : 12:25:46
I would also be interested in the performance of the method I posted.

Also, the 'Quirky' method is fundamentally unrelational and not documented. ie It could be broken by a patch.
Go to Top of Page
    Next Page

- Advertisement -