| Author |
Topic |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-21 : 10:06:23
|
I have a sql query problem that I haven't been able to solve myself for a while now so I'm hoping for some help *experts only please* (for those not familiar with our old friends; that was a joke...any suggestions are more than welcome )What I basically need to do is to identify a) phone numbers that are a part of a series and b) phone numbers that are not a part of a series, from a table of land line phone numbers. A series is defined as phone numbers in sequence that has 10 or more consecutive numbers in a row. Somewhat simple requirements, but the implementation wasn't equally simple (at least not to me). Here's what I got so far:CREATE TABLE numbers ( Number bigint NOT NULL PRIMARY KEY CLUSTERED ) INSERT INTO numbers SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743WITH cte(ID, Number) AS ( SELECT RowNumber = ROW_NUMBER() OVER (ORDER BY number), number FROM numbers )SELECT x.ID, x.Number, SeriesRowNum = ROW_NUMBER() OVER (PARTITION BY Consecutive ORDER BY x.ID), IsSeries = 'CASE WHEN something THEN 1 ELSE 0 END'FROM (SELECT a.ID, a.Number, Consecutive = CASE WHEN b.Number - a.Number = 1 THEN 1 ELSE 0 END FROM cte a INNER JOIN cte b ON a.ID = (b.ID-1) ) as xORDER BY x.ID I have no idea if this is the right/best approach but it has gotten me to a point where I'm at least able to find consecutive numbers. The problem I have now is to get the SeriesRowNum to start at 1 again once it finds a number that's not consecutive and then to find a way to create an IsSeries column that says if a number is in a series or not.Any takers??- Lumbagohttp://xkcd.com/327/ |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-21 : 10:15:19
|
The ultimate output of this query would actually be something like this where each series has a unique synthetic ID attached to it: ID Number SeriesID1 4488726600 02 4488726695 03 4488726696 04 4488726697 05 4488726698 06 4488726701 17 4488726702 18 4488726703 19 4488726704 110 4488726705 111 4488726706 112 4488726707 113 4488726708 114 4488726709 115 4488726710 116 4488726715 0...--> and then the next series would have the ID 2 But that would really be far out...I'm happy with just about anything at this point. And performance is not an issue, this is a one time deal...but a few 100k numbers though. - Lumbagohttp://xkcd.com/327/ |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2009-10-21 : 10:21:40
|
quote: Need a regular table since table variables are not supported in CTE's
Table variable does support in CTE KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-21 : 10:24:24
|
| >>Table variable does support in CTEI couldn't get it to work but that's not really relevant. I'll remove the remark...- Lumbagohttp://xkcd.com/327/ |
 |
|
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2009-10-21 : 10:43:38
|
You could try the row difference technique.(This won some competition but I cannot remember which one.);WITH cteAS( SELECT Number ,ROW_NUMBER() OVER (ORDER BY Number) As RowNum FROM Numbers),cte2AS( SELECT * ,DENSE_RANK() OVER (ORDER BY Number - RowNum) As Series FROM cte),cte3AS( SELECT * ,COUNT(*) OVER (PARTITION BY Series) AS SCount FROM cte2 )SELECT Number ,CASE WHEN SCount = 1 THEN 0 ELSE 1 END AS InSeriesFROM cte3 |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-10-21 : 11:44:40
|
| why not seek 2 numbers which are 10 apart and who have 9/10 numbers in between.select a.id, b.id, count(*) from numbers ainner join numbers b on a.id = b.id + 10inner join numbers c on a.id < c.id and c.id < b.10group by a.id, b.idhaving count(*) >= 10is this approach workable? |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2009-10-21 : 13:21:23
|
quote: Originally posted by Ifor You could try the row difference technique.(This won some competition but I cannot remember which one.)<snip>
That is probably the fastest way that I know of. If it only a one time deal, it should be fine, but you might want to perf test it.small correction:SELECT Number ,CASE WHEN SCount < 10 THEN 0 ELSE 1 END AS InSeriesFROM cte3 |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2009-10-21 : 22:05:32
|
[code]DECLARE @Numbers TABLE ( Number bigint NOT NULL PRIMARY KEY CLUSTERED) INSERT INTO @Numbers SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743; WITH cte(RowNumber, Number) AS ( SELECT RowNumber = ROW_Number() OVER (ORDER BY Number), Number FROM @Numbers),rcteAS( SELECT c1.RowNumber, c1.Number, Series = 1 FROM cte c1 WHERE c1.RowNumber = 1 UNION ALL SELECT c.RowNumber, c.Number, Series = CASE WHEN r.Number = c.Number - 1 THEN r.Series ELSE r.Series + 1 END FROM rcte r INNER JOIN cte c ON r.RowNumber = c.RowNumber - 1)SELECT RowNumber, Number, Series, SeriesNo, SeriesID = CASE WHEN SeriesNo = 0 THEN 0 ELSE dense_rank() OVER (ORDER BY SeriesNo) - 1 ENDFROM( SELECT RowNumber, Number, Series, SeriesNo = CASE WHEN COUNT(*) OVER (PARTITION BY Series) < 10 THEN 0 ELSE Series END FROM rcte) sORDER BY RowNumber/*RowNumber Number Series SeriesNo SeriesID -------------------- -------------------- ----------- ----------- -------------------- 1 4488726600 1 0 02 4488726695 2 0 03 4488726696 2 0 04 4488726697 2 0 05 4488726698 2 0 06 4488726701 3 3 17 4488726702 3 3 18 4488726703 3 3 19 4488726704 3 3 110 4488726705 3 3 111 4488726706 3 3 112 4488726707 3 3 113 4488726708 3 3 114 4488726709 3 3 115 4488726710 3 3 116 4488726715 4 0 017 4488726717 5 5 218 4488726718 5 5 219 4488726719 5 5 220 4488726720 5 5 221 4488726721 5 5 222 4488726722 5 5 223 4488726723 5 5 224 4488726724 5 5 225 4488726725 5 5 226 4488726726 5 5 227 4488726727 5 5 228 4488726728 5 5 229 4488726729 5 5 230 4488726730 5 5 231 4488726731 5 5 232 4488726732 5 5 233 4488726733 5 5 234 4488726734 5 5 235 4488726739 6 0 036 4488726743 7 0 0(36 row(s) affected)*/[/code] KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 03:30:45
|
Now this is why I love SQLTeam and keep coming back all the time...the knowledge and help you get from the people here is just mindblowing!! It's gonna take me a while to go through each of the suggestions and try to understand and adapt them, but I have to say that Khtans solution looks *really* promising - Lumbagohttp://xkcd.com/327/ |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 03:59:43
|
| Ok, since khtans solution looked the most promising I checked that one first and it worked *perfectly* except for one thing which might mess everything up; when I run it against my real table I got the following error:Msg 530, Level 16, State 1, Line 18The statement terminated. The maximum recursion 100 has been exhausted before statement completion.- Lumbagohttp://xkcd.com/327/ |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2009-10-22 : 04:06:48
|
set the MAXRECURSION to 0add this to the last lineOPTION (MAXRECURSION 0) KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 04:25:12
|
| Well...there is one new level of recursion for every number in the table which means that I will have a few 100k levels of recursion when I run this on my real data. I'm suspecting that the server will not enjoy this very much...?- Lumbagohttp://xkcd.com/327/ |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2009-10-22 : 04:35:16
|
Do you have a test server ? Let's see how does the server takes it  KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 05:06:56
|
Hehe...yes I do have a test server and I ran it on 5000 numbers. The query took 2 mins 32 secs and the server didn't go completely haywire but pretty much. All 4 CPU cores were running on 60-70% for the duration of the query but as I said previously performance really isn't an issue here since this a one-time deal and I can run it at off-peak hours. I complies 100% to the requirements and for that I am forever grateful.*I owe you bigtime!!* (hmm...I'm getting in debt to quite a few people here )- Lumbagohttp://xkcd.com/327/ |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2009-10-22 : 05:12:10
|
If you worried about the recursion, try this Quirky update techniqueDECLARE @Numbers TABLE ( Number bigint NOT NULL PRIMARY KEY CLUSTERED, sno int) INSERT INTO @Numbers (number)SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743declare @sno int,@number bigintselect @sno=0, @number=0update @Numbersset sno=@sno,@sno=case when @number-number=-1 then 1 else 0 end,@number=numberselect @sno=0, @number=0update @Numbersset sno=@sno,@sno=case when @number-number=-1 then @sno else @sno+1 end,@number=numberselect number,dense_rank() over (order by case when series_no<10 then 0 else series_no end)-1 as series_no from(select *,count(*) over (partition by sno) as series_no from @numbers) as torder by number Find more here about Quirky update techniquehttp://sqlblogcasts.com/blogs/madhivanan/archive/2009/06/10/quirky-update-in-sql-server.aspxMadhivananFailing to plan is Planning to fail |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2009-10-22 : 06:44:21
|
Dont need two updationsDECLARE @Numbers TABLE ( Number bigint NOT NULL PRIMARY KEY CLUSTERED, sno int) INSERT INTO @Numbers (number)SELECT 4488726695 UNION ALL SELECT 4488726696 UNION ALL SELECT 4488726697 UNION ALL SELECT 4488726698 UNION ALL SELECT 4488726600 UNION ALL SELECT 4488726701 UNION ALL SELECT 4488726702 UNION ALL SELECT 4488726703 UNION ALL SELECT 4488726704 UNION ALL SELECT 4488726705 UNION ALL SELECT 4488726706 UNION ALL SELECT 4488726707 UNION ALL SELECT 4488726708 UNION ALL SELECT 4488726709 UNION ALL SELECT 4488726710 UNION ALL SELECT 4488726715 UNION ALL SELECT 4488726717 UNION ALL SELECT 4488726718 UNION ALL SELECT 4488726719 UNION ALL SELECT 4488726720 UNION ALL SELECT 4488726721 UNION ALL SELECT 4488726722 UNION ALL SELECT 4488726723 UNION ALL SELECT 4488726724 UNION ALL SELECT 4488726725 UNION ALL SELECT 4488726726 UNION ALL SELECT 4488726727 UNION ALL SELECT 4488726728 UNION ALL SELECT 4488726729 UNION ALL SELECT 4488726730 UNION ALL SELECT 4488726731 UNION ALL SELECT 4488726732 UNION ALL SELECT 4488726733 UNION ALL SELECT 4488726734 UNION ALL SELECT 4488726739 UNION ALL SELECT 4488726743declare @sno int,@number bigintselect @sno=0, @number=0update @Numbersset sno=@sno,@sno=case when @number-number=-1 then @sno else @sno+1 end,@number=numberselect number,dense_rank() over (order by case when series_no<10 then 0 else series_no end)-1 as series_no from(select *,count(*) over (partition by sno) as series_no from @numbers) as torder by number MadhivananFailing to plan is Planning to fail |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 07:52:12
|
| I've done quite a bit of testing now and performance-wise it's a complete no-brainer: working through my 237k numbers takes 12secs using madhis script and khtans script takes 2.5 min for 5000 numbers. However madhis version is not returning the right results...the final query reports the same series_no for several of the series even though they are not connected i any way. I have tried but I haven't been able to figure out why yet.- Lumbagohttp://xkcd.com/327/ |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-10-22 : 08:29:36
|
I've fiddled a little with madhis version now and I have to say that this Quirky update method is really cool!! I had to do a slight change (in red) an now it actually seems to be working 100%...I'm simply amazed:...select number, series_id = dense_rank() over (order by case when series_count < 10 then 0 else sno end)-1from (select *, series_count = count(*) over (partition by sno) from @numbers) as torder by number I'm still trying to figure out what's going on though so this might not be the end of it but I have to say that it looks extremely promising! 237k numbers processed in 12 seconds _including_ transferring all the data to my client :)- Lumbagohttp://xkcd.com/327/ |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2009-10-22 : 08:52:00
|
quote: Originally posted by Lumbago I've fiddled a little with madhis version now and I have to say that this Quirky update method is really cool!! I had to do a slight change (in red) an now it actually seems to be working 100%...I'm simply amazed:...select number, series_id = dense_rank() over (order by case when series_count < 10 then 0 else sno end)-1from (select *, series_count = count(*) over (partition by sno) from @numbers) as torder by number I'm still trying to figure out what's going on though so this might not be the end of it but I have to say that it looks extremely promising! 237k numbers processed in 12 seconds _including_ transferring all the data to my client :)- Lumbagohttp://xkcd.com/327/
Thanks for your feedback MadhivananFailing to plan is Planning to fail |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2009-10-22 : 11:59:29
|
| Lumbago,Did you happen to test Ifors method? I'm not sure about execution time, but on a small sample data set it seems to about twice as effecient as the Quirky method. Just curious about the performance in the "real world." :) |
 |
|
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2009-10-22 : 12:25:46
|
| I would also be interested in the performance of the method I posted.Also, the 'Quirky' method is fundamentally unrelational and not documented. ie It could be broken by a patch. |
 |
|
|
Next Page
|