Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Moving Records

Author  Topic 

dataczar
Starting Member

18 Posts

Posted - 2008-10-06 : 17:59:56
I am trying to set the claim_id equal to the original if the previous rows end_date < 180 days from the current rows begin_date

Here is a SQL statement with some sample data:

declare @test table
(
customer_id int,
claim_id int,
begin_date datetime,
end_date datetime
)

insert into @test
select 1,1, '2005-09-12 00:00:00.000', '2005-09-12 00:00:00.000' union all
select 1,2, '2005-10-10 00:00:00.000', '2005-10-10 00:00:00.000' union all
select 1,3, '2005-12-06 00:00:00.000', '2006-01-16 00:00:00.000' union all
select 1,4, '2007-07-02 00:00:00.000', '2007-07-23 00:00:00.000' union all
select 1,5, '2007-10-05 00:00:00.000', NULL


select *
from @test

Here is the result: (I took away the time from end_date so it would fit)

5 row(s) affected)
customer_id claim_id begin_date end_date
----------- ----------- ----------------------- -----------------------
1 1 2005-09-12 00:00:00.000 2005-09-12
1 2 2005-10-10 00:00:00.000 2005-10-10
1 3 2005-12-06 00:00:00.000 2006-01-16
1 4 2007-07-02 00:00:00.000 2007-07-23
1 5 2007-10-05 00:00:00.000 NULL

(5 row(s) affected)


I would like my result to look like this: the items in red are what I am trying to get to.

customer_id claim_id begin_date end_date
----------- ----------- ----------------------- -----------------------
1 1 2005-09-12 00:00:00.000 2005-09-12
1 12005-10-10 00:00:00.000 2005-10-10
1 12005-12-06 00:00:00.000 2006-01-16
1 4 2007-07-02 00:00:00.000 2007-07-23
1 42007-10-05 00:00:00.000 NULL

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-06 : 18:43:19
I am not at a computer with sql server so I couldn't test this code but try this:

;with cte
as
(
select customer_id
,claim_id
,begin_date
,end_date
,rn = row_number() over (partition by customer_id order by begin_date)
from @test
)
select curr.customer_id
,case
when datediff(day, prev.end_date, curr.begin_date) < 180 then prev.claim_id
else curr.claim_id
end
,curr.begin_date
,curr.end_date
from cte curr
left join cte prev
on prev.customer_id = curr.customer_id
and prev.rn+1 = curr.rn


EDIT:
woops, I see a problem already...Hold on a sec, I'll connect to work and get this right...

Be One with the Optimizer
TG
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-06 : 18:58:05
>>I am trying to set the claim_id equal to the original if the previous rows end_date < 180 days from the current rows begin_date

Does the current row's begin_date need to be within 180 days of the PREV or the ORIG row?

ie: given these rows (4 rows each row 5 months later than the previous row. What should the claim_ids be?
customer_id claim_id begin_date end_date
----------- ----------- ----------------------- -----------------------
1 1 2005-01-12 00:00:00.000 2005-01-12
1 2 2005-06-12 00:00:00.000 2005-06-12
1 3 2005-11-12 00:00:00.000 2005-11-12
1 4 2006-04-12 00:00:00.000 2005-04-12


Be One with the Optimizer
TG
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-06 : 19:18:43
This works with your original example. Let me know if it does't work base on my previous question:

;with cte
as
(
select customer_id
,claim_id
,begin_date
,end_date
,datediff(day, 0, begin_date) / 180 grp
from @test
)
select c.customer_id
,d.claim_id
,c.begin_date
,c.end_date
from (
select customer_id
,grp
,min(claim_id) claim_id
from cte
group by customer_id
,grp
) d
join cte c
on c.customer_id = d.customer_id
and c.grp = d.grp


Be One with the Optimizer
TG
Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-06 : 21:55:21
TG,

That second one you posted was exactly what I was trying to do.

I appreciate the help

Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-06 : 22:05:41
TG

One other question I have is what if the min(claim_id) is not the first one in the group. I know it is in my example, but it is not in the actual data that I am working with.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-06 : 23:04:33
Can you post an example to illustrate the problem?

EDIT:
while you're at it, post the desired output for my sample data.

Be One with the Optimizer
TG
Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-07 : 10:16:50
Your solution worked for the example that I put up there. (See my fist post for the desired solution).

When I transferred it to the real data, I had to use a row_number() over(partition by customer_id ,grp order by customer_id, begin_date). I chose the row where the row_number was one.

When I ran the solution against the larger data set, the grouping logic failed on some records.

Here is an example.

declare @test table
(
customer_id int,
claim_id int,
begin_date datetime,
end_date datetime
)

insert into @test
select 1,1, '2007-11-05 00:00:00.000', '2008-03-31 00:00:00.000' union all
select 1,2, '2008-04-30 00:00:00.000', NULL


select *
from @test


Using your solution the two grp are different: Claim_id 1 gets a grp of 218 , while claim_id 2 gets a grp of 219

customer_id claim_id begin_date end_date
----------- ----------- ----------------------- -----------------------
1 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000
1 2 2008-04-30 00:00:00.000 NULL



Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-07 : 17:52:17
Not sure if you were saying you still need help or not but I have another solution that solves your second example:

declare @test table
(
customer_id int,
claim_id int,
begin_date datetime,
end_date datetime
)

--original sample data
insert into @test
select 1,1, '2005-09-12 00:00:00.000', '2005-09-12 00:00:00.000' union all
select 1,2, '2005-10-10 00:00:00.000', '2005-10-10 00:00:00.000' union all
select 1,3, '2005-12-06 00:00:00.000', '2006-01-16 00:00:00.000' union all
select 1,4, '2007-07-02 00:00:00.000', '2007-07-23 00:00:00.000' union all
select 1,5, '2007-10-05 00:00:00.000', NULL

--my sample data (you never responded with what is the correct values here)
insert @test
select 2, 1, '2005-01-12 00:00:00.000', '2005-01-12' union all
select 2, 2, '2005-06-12 00:00:00.000', '2005-06-12' union all
select 2, 3, '2005-11-12 00:00:00.000', '2005-11-12' union all
select 2, 4, '2006-04-12 00:00:00.000', '2005-04-12'

--your second sample data
insert into @test
select 3,1, '2007-11-05 00:00:00.000', '2008-03-31 00:00:00.000' union all
select 3,2, '2008-04-30 00:00:00.000', NULL


;with pre ( customer_id, claim_id, begin_date, end_date, rn)
as
(
select customer_id
,claim_id
,begin_date
,end_date
,row_number() over (partition by customer_id order by claim_id)
from @test
)
,recurs ( customer_id, claim_id, begin_date, end_date, rn, orig_end_date, orig_claim_id)
as
(
select customer_id
,claim_id
,begin_date
,end_date
,rn
,end_date
,claim_id
from pre
where rn = 1
union all
select p.customer_id
,p.claim_id
,p.begin_date
,p.end_date
,p.rn
,case when datediff(day, r.orig_end_date, p.begin_date) <= 180 then r.orig_end_date else p.end_date end
,case when datediff(day, r.orig_end_date, p.begin_date) <= 180 then r.orig_claim_id else p.claim_id end
from recurs r
join pre p on p.customer_id = r.customer_id
and p.rn = r.rn+1
)
select customer_id
,orig_claim_id
,begin_date
,end_date
from recurs
order by customer_id
,begin_date


output:

customer_id orig_claim_id begin_date end_date
----------- ------------- ----------------------- -----------------------
1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.000 --original sample data
1 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.000
1 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.000
1 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.000
1 4 2007-10-05 00:00:00.000 NULL

2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.000 --my sample data (you never responded with correct values)
2 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.000
2 3 2005-11-12 00:00:00.000 2005-11-12 00:00:00.000
2 3 2006-04-12 00:00:00.000 2005-04-12 00:00:00.000

3 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000 --your second sample data
3 1 2008-04-30 00:00:00.000 NULL


Be One with the Optimizer
TG
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-07 : 18:42:03
simpler solution but I get a different answer for my sample data.

select t.customer_id
,isNull(ca.claim_id, t.claim_id) claim_id
,t.begin_date
,t.end_date
from @test t
cross apply (
select min(claim_id) claim_id
from @test
where customer_id = t.customer_id
and begin_date < t.begin_date
and datediff(day, end_date, t.begin_date) <= 180
) ca


Be One with the Optimizer
TG
Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-09 : 10:08:30
TG,

Here is what the output should be: This is what you have except for the rersult from your data.

I am wondering if I need to use a cursor for this. What do you think?

output:

customer_id orig_claim_id begin_date end_date
----------- ------------- ----------------------- -----------------------
1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.000 --original sample data
1 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.000
1 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.000
1 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.000
1 4 2007-10-05 00:00:00.000 NULL

2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.000 --my sample data (you never responded with correct values)
2 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.000
2 1 2005-11-12 00:00:00.000 2005-11-12 00:00:00.000
2 1 2006-04-12 00:00:00.000 2005-04-12 00:00:00.000

3 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000 --your second sample data
3 1 2008-04-30 00:00:00.000 NULL
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-09 : 10:32:21
>>I am wondering if I need to use a cursor for this. What do you think?
Nope, don't think so.

I just wanted to hear the expected values for my data because your orig criteria was a little ambigious as the the definition of "original claim_id". original is the earilies occurance of an unbroken chain of claim_id where each claim is no more than 180 days from the previous claim.

Just requires a tweak to my second solution:

;with pre ( customer_id, claim_id, begin_date, end_date, rn)
as
(
select customer_id
,claim_id
,begin_date
,end_date
,row_number() over (partition by customer_id order by claim_id)
from @test
)
,recurs ( customer_id, claim_id, begin_date, end_date, rn, orig_end_date, orig_claim_id)
as
(
select customer_id
,claim_id
,begin_date
,end_date
,rn
,end_date
,claim_id
from pre
where rn = 1
union all
select p.customer_id
,p.claim_id
,p.begin_date
,p.end_date
,p.rn
,case when datediff(day, r.end_date, p.begin_date) <= 180 then r.orig_end_date else p.end_date end
,case when datediff(day, r.end_date, p.begin_date) <= 180 then r.orig_claim_id else p.claim_id end
from recurs r
join pre p on p.customer_id = r.customer_id
and p.rn = r.rn+1
)
select customer_id
,orig_claim_id
,begin_date
,end_date
from recurs
order by customer_id
,begin_date

output:
customer_id orig_claim_id begin_date end_date
----------- ------------- ----------------------- -----------------------
1 1 2005-09-12 00:00:00.000 2005-09-12 00:00:00.000
1 1 2005-10-10 00:00:00.000 2005-10-10 00:00:00.000
1 1 2005-12-06 00:00:00.000 2006-01-16 00:00:00.000
1 4 2007-07-02 00:00:00.000 2007-07-23 00:00:00.000
1 4 2007-10-05 00:00:00.000 NULL
2 1 2005-01-12 00:00:00.000 2005-01-12 00:00:00.000
2 1 2005-06-12 00:00:00.000 2005-06-12 00:00:00.000
2 1 2005-11-12 00:00:00.000 2005-11-12 00:00:00.000
2 1 2006-04-12 00:00:00.000 2005-04-12 00:00:00.000
3 1 2007-11-05 00:00:00.000 2008-03-31 00:00:00.000
3 1 2008-04-30 00:00:00.000 NULL


Be One with the Optimizer
TG
Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-09 : 14:20:27
Sorry if I was not clear on the expected outcome. I am testing the solution of the real full data set (around 5,000 rows) and it is coming back in about 30 seconds. I will be looking at several to make sure everything is okay.

Thank you for your help with this.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-09 : 15:35:57
You're welcome! just post any examples that "break" with the latest logic...

30 seconds seems kind of slow. is your actual query dealing with lots more columns or additional joins? What is the nature of the data? ie: how many customer_ids and about how many claims per customer?

Be One with the Optimizer
TG
Go to Top of Page

dataczar
Starting Member

18 Posts

Posted - 2008-10-09 : 15:42:50
I added no more columns to the logic. I only change the input from using test data to the real data (That query comes back immediately. There are 2024 customers with a total of 4,806 claims (each customer has between 2 and 7 claims (70% have just 2 claims).
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-10-09 : 15:52:56
hmm - I would expect that distribution to be faster. The recursive part is only repeating about 7 times (largest number of claims for any customer) And with only 4800 claims total it should be pretty quick.

If I have time i may set up a table sample data and see if I get similar results.

Be One with the Optimizer
TG
Go to Top of Page
   

- Advertisement -