Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-06 : 17:59:56
|
| I am trying to set the claim_id equal to the original if the previous rows end_date < 180 days from the current rows begin_dateHere is a SQL statement with some sample data:declare @test table(customer_id int,claim_id int,begin_date datetime,end_date datetime)insert into @testselect 1,1, '2005-09-12 00:00:00.000', '2005-09-12 00:00:00.000' union allselect 1,2, '2005-10-10 00:00:00.000', '2005-10-10 00:00:00.000' union allselect 1,3, '2005-12-06 00:00:00.000', '2006-01-16 00:00:00.000' union allselect 1,4, '2007-07-02 00:00:00.000', '2007-07-23 00:00:00.000' union allselect 1,5, '2007-10-05 00:00:00.000', NULLselect *from @testHere is the result: (I took away the time from end_date so it would fit)5 row(s) affected)customer_id claim_id begin_date end_date----------- ----------- ----------------------- -----------------------1 1 2005-09-12 00:00:00.000 2005-09-12 1 2 2005-10-10 00:00:00.000 2005-10-10 1 3 2005-12-06 00:00:00.000 2006-01-16 1 4 2007-07-02 00:00:00.000 2007-07-23 1 5 2007-10-05 00:00:00.000 NULL(5 row(s) affected)I would like my result to look like this: the items in red are what I am trying to get to.customer_id claim_id begin_date end_date----------- ----------- ----------------------- -----------------------1 1 2005-09-12 00:00:00.000 2005-09-12 1 12005-10-10 00:00:00.000 2005-10-10 1 12005-12-06 00:00:00.000 2006-01-16 1 4 2007-07-02 00:00:00.000 2007-07-23 1 42007-10-05 00:00:00.000 NULL |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-06 : 18:43:19
|
I am not at a computer with sql server so I couldn't test this code but try this:;with cteas(select customer_id ,claim_id ,begin_date ,end_date ,rn = row_number() over (partition by customer_id order by begin_date)from @test)select curr.customer_id ,case when datediff(day, prev.end_date, curr.begin_date) < 180 then prev.claim_id else curr.claim_id end ,curr.begin_date ,curr.end_datefrom cte currleft join cte prev on prev.customer_id = curr.customer_id and prev.rn+1 = curr.rn EDIT:woops, I see a problem already...Hold on a sec, I'll connect to work and get this right...Be One with the OptimizerTG |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-06 : 18:58:05
|
| >>I am trying to set the claim_id equal to the original if the previous rows end_date < 180 days from the current rows begin_dateDoes the current row's begin_date need to be within 180 days of the PREV or the ORIG row? ie: given these rows (4 rows each row 5 months later than the previous row. What should the claim_ids be?customer_id claim_id begin_date end_date----------- ----------- ----------------------- -----------------------1 1 2005-01-12 00:00:00.000 2005-01-12 1 2 2005-06-12 00:00:00.000 2005-06-12 1 3 2005-11-12 00:00:00.000 2005-11-12 1 4 2006-04-12 00:00:00.000 2005-04-12 Be One with the OptimizerTG |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-06 : 19:18:43
|
This works with your original example. Let me know if it does't work base on my previous question:;with cteas(select customer_id ,claim_id ,begin_date ,end_date ,datediff(day, 0, begin_date) / 180 grpfrom @test)select c.customer_id ,d.claim_id ,c.begin_date ,c.end_datefrom ( select customer_id ,grp ,min(claim_id) claim_id from cte group by customer_id ,grp ) djoin cte c on c.customer_id = d.customer_id and c.grp = d.grp Be One with the OptimizerTG |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-06 : 21:55:21
|
TG,That second one you posted was exactly what I was trying to do.I appreciate the help |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-06 : 22:05:41
|
| TGOne other question I have is what if the min(claim_id) is not the first one in the group. I know it is in my example, but it is not in the actual data that I am working with. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-06 : 23:04:33
|
| Can you post an example to illustrate the problem?EDIT:while you're at it, post the desired output for my sample data.Be One with the OptimizerTG |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-07 : 10:16:50
|
| Your solution worked for the example that I put up there. (See my fist post for the desired solution).When I transferred it to the real data, I had to use a row_number() over(partition by customer_id ,grp order by customer_id, begin_date). I chose the row where the row_number was one.When I ran the solution against the larger data set, the grouping logic failed on some records.Here is an example.declare @test table(customer_id int,claim_id int,begin_date datetime,end_date datetime)insert into @testselect 1,1, '2007-11-05 00:00:00.000', '2008-03-31 00:00:00.000' union allselect 1,2, '2008-04-30 00:00:00.000', NULL select *from @testUsing your solution the two grp are different: Claim_id 1 gets a grp of 218 , while claim_id 2 gets a grp of 219customer_id claim_id begin_date end_date----------- ----------- ----------------------- -----------------------1 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.0001 2 2008-04-30 00:00:00.000 NULL |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-07 : 17:52:17
|
Not sure if you were saying you still need help or not but I have another solution that solves your second example:declare @test table(customer_id int,claim_id int,begin_date datetime,end_date datetime)--original sample datainsert into @testselect 1,1, '2005-09-12 00:00:00.000', '2005-09-12 00:00:00.000' union allselect 1,2, '2005-10-10 00:00:00.000', '2005-10-10 00:00:00.000' union allselect 1,3, '2005-12-06 00:00:00.000', '2006-01-16 00:00:00.000' union allselect 1,4, '2007-07-02 00:00:00.000', '2007-07-23 00:00:00.000' union allselect 1,5, '2007-10-05 00:00:00.000', NULL--my sample data (you never responded with what is the correct values here)insert @testselect 2, 1, '2005-01-12 00:00:00.000', '2005-01-12' union allselect 2, 2, '2005-06-12 00:00:00.000', '2005-06-12' union all select 2, 3, '2005-11-12 00:00:00.000', '2005-11-12' union allselect 2, 4, '2006-04-12 00:00:00.000', '2005-04-12' --your second sample datainsert into @testselect 3,1, '2007-11-05 00:00:00.000', '2008-03-31 00:00:00.000' union allselect 3,2, '2008-04-30 00:00:00.000', NULL ;with pre ( customer_id, claim_id, begin_date, end_date, rn)as (select customer_id ,claim_id ,begin_date ,end_date ,row_number() over (partition by customer_id order by claim_id)from @test),recurs ( customer_id, claim_id, begin_date, end_date, rn, orig_end_date, orig_claim_id) as (select customer_id ,claim_id ,begin_date ,end_date ,rn ,end_date ,claim_idfrom prewhere rn = 1union allselect p.customer_id ,p.claim_id ,p.begin_date ,p.end_date ,p.rn ,case when datediff(day, r.orig_end_date, p.begin_date) <= 180 then r.orig_end_date else p.end_date end ,case when datediff(day, r.orig_end_date, p.begin_date) <= 180 then r.orig_claim_id else p.claim_id endfrom recurs rjoin pre p on p.customer_id = r.customer_id and p.rn = r.rn+1 )select customer_id ,orig_claim_id ,begin_date ,end_datefrom recursorder by customer_id ,begin_dateoutput:customer_id orig_claim_id begin_date end_date----------- ------------- ----------------------- -----------------------1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.000 --original sample data1 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.0001 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.0001 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.0001 4 2007-10-05 00:00:00.000 NULL2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.000 --my sample data (you never responded with correct values)2 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.0002 3 2005-11-12 00:00:00.000 2005-11-12 00:00:00.0002 3 2006-04-12 00:00:00.000 2005-04-12 00:00:00.0003 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000 --your second sample data3 1 2008-04-30 00:00:00.000 NULL Be One with the OptimizerTG |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-07 : 18:42:03
|
simpler solution but I get a different answer for my sample data.select t.customer_id ,isNull(ca.claim_id, t.claim_id) claim_id ,t.begin_date ,t.end_datefrom @test tcross apply ( select min(claim_id) claim_id from @test where customer_id = t.customer_id and begin_date < t.begin_date and datediff(day, end_date, t.begin_date) <= 180) ca Be One with the OptimizerTG |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-09 : 10:08:30
|
| TG,Here is what the output should be: This is what you have except for the rersult from your data.I am wondering if I need to use a cursor for this. What do you think?output:customer_id orig_claim_id begin_date end_date----------- ------------- ----------------------- -----------------------1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.000 --original sample data1 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.0001 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.0001 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.0001 4 2007-10-05 00:00:00.000 NULL2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.000 --my sample data (you never responded with correct values)2 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.0002 1 2005-11-12 00:00:00.000 2005-11-12 00:00:00.0002 1 2006-04-12 00:00:00.000 2005-04-12 00:00:00.0003 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000 --your second sample data3 1 2008-04-30 00:00:00.000 NULL |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-09 : 10:32:21
|
>>I am wondering if I need to use a cursor for this. What do you think?Nope, don't think so.I just wanted to hear the expected values for my data because your orig criteria was a little ambigious as the the definition of "original claim_id". original is the earilies occurance of an unbroken chain of claim_id where each claim is no more than 180 days from the previous claim.Just requires a tweak to my second solution:;with pre ( customer_id, claim_id, begin_date, end_date, rn)as (select customer_id ,claim_id ,begin_date ,end_date ,row_number() over (partition by customer_id order by claim_id)from @test),recurs ( customer_id, claim_id, begin_date, end_date, rn, orig_end_date, orig_claim_id) as (select customer_id ,claim_id ,begin_date ,end_date ,rn ,end_date ,claim_idfrom prewhere rn = 1union allselect p.customer_id ,p.claim_id ,p.begin_date ,p.end_date ,p.rn ,case when datediff(day, r.end_date, p.begin_date) <= 180 then r.orig_end_date else p.end_date end ,case when datediff(day, r.end_date, p.begin_date) <= 180 then r.orig_claim_id else p.claim_id endfrom recurs rjoin pre p on p.customer_id = r.customer_id and p.rn = r.rn+1 )select customer_id ,orig_claim_id ,begin_date ,end_datefrom recursorder by customer_id ,begin_dateoutput:customer_id orig_claim_id begin_date end_date----------- ------------- ----------------------- -----------------------1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.0001 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.0001 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.0001 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.0001 4 2007-10-05 00:00:00.000 NULL2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.0002 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.0002 1 2005-11-12 00:00:00.000 2005-11-12 00:00:00.0002 1 2006-04-12 00:00:00.000 2005-04-12 00:00:00.0003 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.0003 1 2008-04-30 00:00:00.000 NULL Be One with the OptimizerTG |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-09 : 14:20:27
|
| Sorry if I was not clear on the expected outcome. I am testing the solution of the real full data set (around 5,000 rows) and it is coming back in about 30 seconds. I will be looking at several to make sure everything is okay.Thank you for your help with this. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-09 : 15:35:57
|
| You're welcome! just post any examples that "break" with the latest logic...30 seconds seems kind of slow. is your actual query dealing with lots more columns or additional joins? What is the nature of the data? ie: how many customer_ids and about how many claims per customer?Be One with the OptimizerTG |
 |
|
|
dataczar
Starting Member
18 Posts |
Posted - 2008-10-09 : 15:42:50
|
| I added no more columns to the logic. I only change the input from using test data to the real data (That query comes back immediately. There are 2024 customers with a total of 4,806 claims (each customer has between 2 and 7 claims (70% have just 2 claims). |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2008-10-09 : 15:52:56
|
| hmm - I would expect that distribution to be faster. The recursive part is only repeating about 7 times (largest number of claims for any customer) And with only 4800 claims total it should be pretty quick.If I have time i may set up a table sample data and see if I get similar results.Be One with the OptimizerTG |
 |
|
|
|
|
|
|
|