Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Calculate days in overlapping Date Range

Author  Topic 

cvipin
Yak Posting Veteran

51 Posts

Posted - 2013-03-22 : 13:08:34
Hi,

I have following data. There are start and end dates for each hold type. I need to calculate the number of days a hold was placed on a loan. The priority of hold is BK, LM then OTH. Meaning, if I calculate number of days LM hold was placed on loan, I need to subtract any overlapping days with BK. If calculating for OTH, then overlap of LM and BK needs to be checked.

DECLARE @Table TABLE
(
LoanNumber varchar(25),
HoldType varchar(3),
StartDate date,
EndDate date
)
INSERT INTO @Table VALUES ('1234', 'BK', '8/10/2010', '8/20/2010')
INSERT INTO @Table VALUES ('1234', 'BK', '8/25/2010', '9/10/2010')
INSERT INTO @Table VALUES ('1234', 'BK', '10/20/2010', '10/30/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '7/14/2010', '7/25/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '8/14/2010', '9/15/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '9/20/2010', '10/25/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '7/10/2010', '7/20/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '8/22/2010', '9/12/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '10/22/2010', '11/30/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '8/07/2010', '8/18/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '8/24/2010', '9/13/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '10/21/2010', '10/31/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '7/12/2010', '7/29/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '8/14/2010', '9/15/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '9/20/2010', '10/25/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '7/10/2010', '7/20/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '8/22/2010', '9/12/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '10/22/2010', '11/30/2010')

Output expected:
LoanNumber BKCount LMCount OTHCount
1234 36 51 35
5678 41 56 32

I implemented the same with the use of DateDim as below, but this has become inefficient as the number of records are in millions.

;WITH CTE_All AS
(

SELECT DISTINCT LoanNumber, HoldType, DD.date
FROM @Table T
JOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate
)

SELECT LoanNumber, SUM(BKCount) BKCount, SUM(LMCount) LMCount, SUM(OTHCount) OTHCount
FROM
(
SELECT LoanNumber, COUNT(1) BKCount, 0 LMCount, 0 OTHCount
FROM CTE_All WHERE HoldType = 'BK'
GROUP BY LoanNumber

UNION ALL

SELECT LoanNumber, 0 BKCount, COUNT(1) LMCount, 0 OTHCount
FROM
(
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM'
EXCEPT
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK'
)T
GROUP BY LoanNumber

UNION ALL

SELECT LoanNumber, 0 BKCount, 0 LMCount, COUNT(1) OTHCount
FROM
(
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'OTH'
EXCEPT
SELECT DISTINCT LoanNumber, Date FROM CTE_All WHERE HoldType IN ('BK','LM')
)T
GROUP BY LoanNumber
)T
GROUP BY LoanNumber

Can you let me know if there is any efficient way to perform this task.

Thanks
Vipin

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2013-03-22 : 16:37:13
So are you saying your DateTime has millions of rows? do you have any sort of indexing on DD.Date?

instead of CTE and table variables use #temptables with indexes. this can improve your performance. also run a small sample of your query using execution plan to sniff out issues

what do you gain by using DimDate

SELECT DISTINCT LoanNumber, HoldType, DD.date
FROM @Table T
JOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate

I do not see any benefits other than grabbing dd.date which you already have in T. that part makes no sense.
<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion
Go to Top of Page

cvipin
Yak Posting Veteran

51 Posts

Posted - 2013-03-22 : 18:33:17
DateDim contains daily dates from 1/1/1992 till current year end. Using SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate, I get daily dates for that range and it becomes easier to remove any overlap.

Ex: For BK: '8/10/2010', '8/20/2010', with above join I will get 10 rows for 8/11/2010, 8/12/2010,....8/20/2010
For LM: '7/14/2010', '7/25/2010' I get 7/15/2010, 7/16/2010....7/25/2010. So for one date range my logic creates those many rows (10 and 11 in this ex.).

Now to get net LM days I use except to remove any overlapping days.

SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM'
EXCEPT
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK'

I have lot of loans with date ranges and my logic is causing the record count go so high (25 millions).

Let me know if there is any efficient way to tackle this requirement. I am using # table in my actual query.

Thanks
Vipin
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-03-22 : 18:41:55
Here somthing to get you started. It parses, but I have not tested it. I know it has a at least one problem - that it will give you the wrong results if the only type of hold was LM , and the number of days of the hold happened to be a multiple of 10. That logic can be fixed, and if it works correctly for other cases, it should be faster than the CTE approach
select
LoanNumber,
SUM(case when N%10 > 0 then 1 else 0 end) as BKCount,
SUM(case when N%10 = 0 and N%100 > 0 then 1 else 0 end) as LMCount,
SUM(case when N%100 = 0 then 1 else 0 end ) as OTHCount
from
(
select
LoanNumber,
DD.date,
SUM(CASE WHEN HoldType = 'BK'then 1 else 0 End
+ case when HoldType = 'LM' then 10 else 0 end
+ case when HoldType = 'OTH' then 100 else 0 end) N
from
SDM.DateDim DD
inner join @Table t on DD.date >= t.StartDate and DD.date < t.EndDate
group by
LoanNumber,
DD.date
) s
Group by LoanNumber
order by LoanNumber;
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-03-23 : 06:05:51
Two approaches
-- Case 1 (triple pre-aggregations)
WITH cteSource(LoanNumber, BK, LM, OTH)
AS (
SELECT t.LoanNumber,
MAX(CASE WHEN t.HoldType = 'BK' THEN 1 ELSE 0 END) AS BK,
MAX(CASE WHEN t.HoldType = 'LM' THEN 1 ELSE 0 END) AS LM,
MAX(CASE WHEN t.HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTH
FROM @Table AS t
INNER JOIN SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate
AND dd.[Date] <= t.EndDate
GROUP BY t.LoanNumber,
dd.[Date]
)
SELECT LoanNumber,
SUM(BK) AS BKCount,
SUM(CASE WHEN BK = 1 THEN 0 ELSE LM END) AS LMCount,
SUM(CASE WHEN BK = 1 THEN 0 WHEN LM = 1 THEN 0 ELSE OTH END) AS OTHCount
FROM cteSource
GROUP BY LoanNumber;

-- Case 2 (single pre-aggregation)
WITH cteSource(LoanNumber, HoldType)
AS (
SELECT t.LoanNumber,
MIN(t.HoldType) AS HoldType
FROM @Table AS t
INNER JOIN SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate
AND dd.[Date] <= t.EndDate
GROUP BY t.LoanNumber,
dd.[Date]
)
SELECT LoanNumber,
SUM(CASE WHEN HoldType = 'BK' THEN 1 ELSE 0 END) AS BKCount,
SUM(CASE WHEN HoldType = 'LM' THEN 1 ELSE 0 END) AS LMCount,
SUM(CASE WHEN HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTHCount
FROM cteSource
GROUP BY LoanNumber;



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

cvipin
Yak Posting Veteran

51 Posts

Posted - 2013-03-25 : 19:01:52
Server: 32 CPU, 128 GB RAM
Total Loans: 100,810
Total # of holds on all loans: 298,016 (means 298,016 date ranges)

I executed all the scripts one by one, below are the results. (Time in seconds. This includes collecting list of loans with holds.)

Solution By First Run Second Run
Vipin 24 18
James 20 18
SwePeso I 16 16
SwePeso II 18 20

James, the output using your query is correct. Did not face any problem if there are only LM holds multiple of 10. The only change is (DD.date > t.StartDate and DD.date <= t.EndDate)

Thanks James, SwePeso for your valuable inputs. As of now I see that first case by SwePeso performs better. Do let me know if you find more solutions.

Thanks
Vipin
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-03-25 : 20:38:12
quote:
As of now I see that first case by SwePeso performs better
Interesting. I would have thought that SwePeso's second query would perform the best.
Go to Top of Page
   

- Advertisement -