| Author |
Topic |
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-06 : 16:10:40
|
| table expected retid first last group1 4/1/2008 4/1/2008 1 4/1/2008 4/2/2008 1 4/1/2008 4/6/2008 1 4/2/2008 4/3/2008 1 4/2/2008 4/5/2008 1 4/7/2008 4/7/2008 1 4/9/2008 4/9/2008 1 4/9/2008 4/9/2008 1 4/21/2008 4/22/2008 1 4/23/2008 4/25/2008 2 4/21/2008 4/22/2008 2 4/22/2008 4/25/2008 expected retid first last group1 4/1/2008 4/1/2008 11 4/1/2008 4/2/2008 11 4/1/2008 4/6/2008 11 4/2/2008 4/3/2008 11 4/2/2008 4/5/2008 11 4/7/2008 4/7/2008 1 --because 4/7 next to 4/61 4/9/2008 4/9/2008 2 -- 4/9 is not next to 4/71 4/9/2008 4/9/2008 21 4/21/2008 4/22/2008 31 4/23/2008 4/25/2008 32 4/21/2008 4/22/2008 12 4/22/2008 4/25/2008 1-------The rule for each id as the peroid (first to last ) is consecutive it will be in on group. if it more than one day away it will be another group.How I can do it? Thanks. |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-06 : 16:19:43
|
quote: 1 4/2/2008 4/5/2008 11 4/7/2008 4/7/2008 1 --because 4/7 next to 4/6
Where is 4/6 ?When you say "if it more than one day away", do you mean the next [first] is more than one day from the previous [last] ?Be One with the OptimizerTG |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-06 : 16:22:43
|
| because the third record1 4/1/2008 4/6/2008 spans to 4/6Thanks.Jeff |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-07 : 11:54:13
|
| I'm still not really clear on the rules. Do the two dates form a timespan and a group is defined by a collection of rows where if you chain together all the time-spans there is not more than 1 day break in "coverage" ?Be One with the OptimizerTG |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-07 : 12:00:36
|
So if this is a graphical representation of you data, blue is one group and red is another group?days: | | | | | | | | | | | | | | | | | | | | - --- --------------- ----- ------ --------- ------------ - Be One with the OptimizerTG |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-07 : 12:00:56
|
| Exactly, TG!Thanks. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-07 : 13:22:47
|
Ok, here is one (possible) method. Don't know how efficient (or correct) it will be over a lot of actual data but seems to work with your sample:set nocount ondeclare @t table (id int, [first] datetime, [last] datetime, [group] int)insert @t (id, [first], [last])select 1, '4/1/2008', '4/1/2008' union allselect 1, '4/1/2008', '4/2/2008' union allselect 1, '4/1/2008', '4/6/2008' union allselect 1, '4/2/2008', '4/3/2008' union allselect 1, '4/2/2008', '4/5/2008' union allselect 1, '4/7/2008', '4/7/2008' union allselect 1, '4/9/2008', '4/9/2008' union allselect 1, '4/9/2008', '4/9/2008' union allselect 1, '4/21/2008', '4/22/2008' union allselect 1, '4/23/2008', '4/25/2008' union allselect 2, '4/21/2008', '4/22/2008' union allselect 2, '4/22/2008', '4/25/2008';with base (id,[first],[last],rn) as( select id ,[first] ,[last] ,row_number() over (partition by id order by [first], [last]) as rn from @t ),cte (id,[first],[last], maxLast, rn, grp) as ( select id ,[first] ,[last] ,maxLast = [last] ,rn ,grp = 1 from base where rn = 1 union all select b.id ,b.[first] ,b.[last] ,maxLast = case when datediff(day, c.maxLast, b.[last]) > 0 then b.[last] else c.maxLast end ,b.rn ,grp = case when datediff(day, c.maxLast, b.[first] ) > 1 then c.grp + 1 else c.grp end from cte c join base b on b.id = c.id and b.rn = c.rn + 1)--select * from cte order by id, rnupdate t set t.[group] = c.grpfrom @t tjoin cte c on c.id = t.id and c.first = t.first and c.last = t.lastselect * from @toutput:id first last group----------- ----------------------- ----------------------- -----------1 2008-04-01 00:00:00.000 2008-04-01 00:00:00.000 11 2008-04-01 00:00:00.000 2008-04-02 00:00:00.000 11 2008-04-01 00:00:00.000 2008-04-06 00:00:00.000 11 2008-04-02 00:00:00.000 2008-04-03 00:00:00.000 11 2008-04-02 00:00:00.000 2008-04-05 00:00:00.000 11 2008-04-07 00:00:00.000 2008-04-07 00:00:00.000 11 2008-04-09 00:00:00.000 2008-04-09 00:00:00.000 21 2008-04-09 00:00:00.000 2008-04-09 00:00:00.000 21 2008-04-21 00:00:00.000 2008-04-22 00:00:00.000 31 2008-04-23 00:00:00.000 2008-04-25 00:00:00.000 32 2008-04-21 00:00:00.000 2008-04-22 00:00:00.000 12 2008-04-22 00:00:00.000 2008-04-25 00:00:00.000 1 Be One with the OptimizerTG |
 |
|
|
jholovacs
Posting Yak Master
163 Posts |
Posted - 2009-08-07 : 13:23:37
|
so what would you like the format of the output to look like? SELECT TOP 1 w.[name]FROM dbo.women wINNER JOIN dbo.inlaws i ON i.inlaw_id = w.parent_idWHERE i.net_worth > 10000000 AND i.status IN ('dead', 'dying') AND w.husband_id IS NULLORDER BY w.hotness_factor DESC |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-07 : 16:27:48
|
| It is prefect Thank you very much TG and jholovacs.I appreciate your help.Jeff |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-20 : 16:49:53
|
| I have a new problems.I have table with more than 1 million records. Tg's method does not seem to work in this case because the table is oversized. Is there a better way to deal with large table?Thanks.Jeff |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-20 : 17:13:57
|
| >>does not seem to workDoes it error out or wrong results or what?Post the structure of your actual table including Primary key and indexes.Are you updating the [group] column in your table or are you selecting the rows out?(just an update of more than a million rows is going to take time)Is this a one-time process or do you have to do this regularly? If so, how often? And do you need to re-run the whole table each time or can you just to new rows?Be One with the OptimizerTG |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-20 : 17:37:01
|
| ;with base (pid, firstdateofsrv, lastdateofsrv,rn) as ( select pid, firstdateofsrv, lastdateofsrv, row_number() over (partition by pid order by firstdateofsrv, lastdateofsrv) as rn from #BSCIP ) ,cte (pid, firstdateofsrv, lastdateofsrv,maxlastdateofsrv,rn, grp) as ( select pid, firstdateofsrv, lastdateofsrv, maxlastdateofsrv=lastdateofsrv, rn, grp=1 from base where rn=1 union all --recursive part select b.pid, b.firstdateofsrv, b.lastdateofsrv, maxlastdateofsrv= case when datediff(day, c.maxlastdateofsrv, b.lastdateofsrv)>0 then b.lastdateofsrv else c.maxlastdateofsrv end, b.rn, grp= case when datediff(day, c.maxlastdateofsrv, b.firstdateofsrv)>1 then c.grp+1 else c.grp end from cte c join base b on b.pid=c.pid and b.rn=c.rn+1 ) select * into #cte from cte Thanks for response.Above is my code.When I run this code I got error msg like max recursive is 100then I add to the code. option (maxrecursion 1000000)But I still get error msg:The value 1000000 specified for the MAXRECURSION option exceeds the allowed maximum of 32767.Thanks again for your help. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-21 : 10:40:56
|
| So that means you have some PIDs (patient IDs ?)in there with more than 32K service date rows? At one day each that is like 90 years of service!It that is true than we obviously need a different method. But if that is due to a test patient or something then we should just exclude that particular pid from the query.Be One with the OptimizerTG |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-21 : 11:06:27
|
| Run this and let's see the top 10. Maybe there are a bunch of NULLs or something.select top 10 pid, count(*) [pid_count]from from #BSCIPgroup by pidorder by 2 descAlso, since you have pre-generated a temp table ([#BSCIP]) we can avoid the [base] cte by putting the row_number() code in there.Be One with the OptimizerTG |
 |
|
|
jeff06
Posting Yak Master
166 Posts |
Posted - 2009-08-21 : 16:49:02
|
| Thanks TG!I believe I make some mistake in the temp file.Now the program works fine.Thanks you so much for you kind help.Jeff |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-21 : 16:55:19
|
You're welcome! That was a weird problem Glad I could help.Be One with the OptimizerTG |
 |
|
|
|