SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Calculate days in overlapping Date Range
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

cvipin
Starting Member

USA
45 Posts

Posted - 03/22/2013 :  13:08:34  Show Profile  Reply with Quote
Hi,

I have following data. There are start and end dates for each hold type. I need to calculate the number of days a hold was placed on a loan. The priority of hold is BK, LM then OTH. Meaning, if I calculate number of days LM hold was placed on loan, I need to subtract any overlapping days with BK. If calculating for OTH, then overlap of LM and BK needs to be checked.

DECLARE @Table TABLE
(
LoanNumber varchar(25),
HoldType varchar(3),
StartDate date,
EndDate date
)
INSERT INTO @Table VALUES ('1234', 'BK', '8/10/2010', '8/20/2010')
INSERT INTO @Table VALUES ('1234', 'BK', '8/25/2010', '9/10/2010')
INSERT INTO @Table VALUES ('1234', 'BK', '10/20/2010', '10/30/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '7/14/2010', '7/25/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '8/14/2010', '9/15/2010')
INSERT INTO @Table VALUES ('1234', 'LM', '9/20/2010', '10/25/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '7/10/2010', '7/20/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '8/22/2010', '9/12/2010')
INSERT INTO @Table VALUES ('1234', 'OTH', '10/22/2010', '11/30/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '8/07/2010', '8/18/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '8/24/2010', '9/13/2010')
INSERT INTO @Table VALUES ('5678', 'BK', '10/21/2010', '10/31/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '7/12/2010', '7/29/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '8/14/2010', '9/15/2010')
INSERT INTO @Table VALUES ('5678', 'LM', '9/20/2010', '10/25/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '7/10/2010', '7/20/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '8/22/2010', '9/12/2010')
INSERT INTO @Table VALUES ('5678', 'OTH', '10/22/2010', '11/30/2010')

Output expected:
LoanNumber BKCount LMCount OTHCount
1234 36 51 35
5678 41 56 32

I implemented the same with the use of DateDim as below, but this has become inefficient as the number of records are in millions.

;WITH CTE_All AS
(

SELECT DISTINCT LoanNumber, HoldType, DD.date
FROM @Table T
JOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate
)

SELECT LoanNumber, SUM(BKCount) BKCount, SUM(LMCount) LMCount, SUM(OTHCount) OTHCount
FROM
(
SELECT LoanNumber, COUNT(1) BKCount, 0 LMCount, 0 OTHCount
FROM CTE_All WHERE HoldType = 'BK'
GROUP BY LoanNumber

UNION ALL

SELECT LoanNumber, 0 BKCount, COUNT(1) LMCount, 0 OTHCount
FROM
(
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM'
EXCEPT
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK'
)T
GROUP BY LoanNumber

UNION ALL

SELECT LoanNumber, 0 BKCount, 0 LMCount, COUNT(1) OTHCount
FROM
(
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'OTH'
EXCEPT
SELECT DISTINCT LoanNumber, Date FROM CTE_All WHERE HoldType IN ('BK','LM')
)T
GROUP BY LoanNumber
)T
GROUP BY LoanNumber

Can you let me know if there is any efficient way to perform this task.

Thanks
Vipin

yosiasz
Flowing Fount of Yak Knowledge

USA
1635 Posts

Posted - 03/22/2013 :  16:37:13  Show Profile  Click to see yosiasz's MSN Messenger address  Reply with Quote
So are you saying your DateTime has millions of rows? do you have any sort of indexing on DD.Date?

instead of CTE and table variables use #temptables with indexes. this can improve your performance. also run a small sample of your query using execution plan to sniff out issues

what do you gain by using DimDate

SELECT DISTINCT LoanNumber, HoldType, DD.date
FROM @Table T
JOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate

I do not see any benefits other than grabbing dd.date which you already have in T. that part makes no sense.
<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion

Edited by - yosiasz on 03/22/2013 16:39:01
Go to Top of Page

cvipin
Starting Member

USA
45 Posts

Posted - 03/22/2013 :  18:33:17  Show Profile  Reply with Quote
DateDim contains daily dates from 1/1/1992 till current year end. Using SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate, I get daily dates for that range and it becomes easier to remove any overlap.

Ex: For BK: '8/10/2010', '8/20/2010', with above join I will get 10 rows for 8/11/2010, 8/12/2010,....8/20/2010
For LM: '7/14/2010', '7/25/2010' I get 7/15/2010, 7/16/2010....7/25/2010. So for one date range my logic creates those many rows (10 and 11 in this ex.).

Now to get net LM days I use except to remove any overlapping days.

SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM'
EXCEPT
SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK'

I have lot of loans with date ranges and my logic is causing the record count go so high (25 millions).

Let me know if there is any efficient way to tackle this requirement. I am using # table in my actual query.

Thanks
Vipin
Go to Top of Page

James K
Flowing Fount of Yak Knowledge

3636 Posts

Posted - 03/22/2013 :  18:41:55  Show Profile  Reply with Quote
Here somthing to get you started. It parses, but I have not tested it. I know it has a at least one problem - that it will give you the wrong results if the only type of hold was LM , and the number of days of the hold happened to be a multiple of 10. That logic can be fixed, and if it works correctly for other cases, it should be faster than the CTE approach
select
	LoanNumber,
	SUM(case when N%10 > 0 then 1 else 0 end) as BKCount,
	SUM(case when N%10 = 0 and N%100 > 0 then 1 else 0 end) as LMCount,
	SUM(case when N%100 = 0 then 1 else 0 end )  as OTHCount
from
( 
	select
		LoanNumber,
		DD.date,
		SUM(CASE WHEN HoldType = 'BK'then 1 else 0 End 
		+ case when HoldType = 'LM' then 10 else 0 end
		+ case when HoldType = 'OTH' then 100 else 0 end) N
	from
		SDM.DateDim  DD
		inner join @Table  t on DD.date >= t.StartDate and DD.date < t.EndDate
	group by
		LoanNumber,
		DD.date
) s 
Group by LoanNumber
order by LoanNumber;
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30240 Posts

Posted - 03/23/2013 :  06:05:51  Show Profile  Visit SwePeso's Homepage  Reply with Quote
Two approaches
-- Case 1 (triple pre-aggregations)
WITH cteSource(LoanNumber, BK, LM, OTH)
AS (
	SELECT		t.LoanNumber,
			MAX(CASE WHEN t.HoldType = 'BK' THEN 1 ELSE 0 END) AS BK,
			MAX(CASE WHEN t.HoldType = 'LM' THEN 1 ELSE 0 END) AS LM,
			MAX(CASE WHEN t.HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTH
	FROM		@Table AS t
	INNER JOIN	SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate
				AND dd.[Date] <= t.EndDate
	GROUP BY	t.LoanNumber,
			dd.[Date]
)
SELECT		LoanNumber,
		SUM(BK) AS BKCount,
		SUM(CASE WHEN BK = 1 THEN 0 ELSE LM END) AS LMCount,
		SUM(CASE WHEN BK = 1 THEN 0 WHEN LM = 1 THEN 0 ELSE OTH END) AS OTHCount
FROM		cteSource
GROUP BY	LoanNumber;

-- Case 2 (single pre-aggregation)
WITH cteSource(LoanNumber, HoldType)
AS (
	SELECT		t.LoanNumber,
			MIN(t.HoldType) AS HoldType
	FROM		@Table AS t
	INNER JOIN	SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate
				AND dd.[Date] <= t.EndDate
	GROUP BY	t.LoanNumber,
			dd.[Date]
)
SELECT		LoanNumber,
		SUM(CASE WHEN HoldType = 'BK' THEN 1 ELSE 0 END) AS BKCount,
		SUM(CASE WHEN HoldType = 'LM' THEN 1 ELSE 0 END) AS LMCount,
		SUM(CASE WHEN HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTHCount
FROM		cteSource
GROUP BY	LoanNumber;



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

cvipin
Starting Member

USA
45 Posts

Posted - 03/25/2013 :  19:01:52  Show Profile  Reply with Quote
Server: 32 CPU, 128 GB RAM
Total Loans: 100,810
Total # of holds on all loans: 298,016 (means 298,016 date ranges)

I executed all the scripts one by one, below are the results. (Time in seconds. This includes collecting list of loans with holds.)

Solution By First Run Second Run
Vipin 24 18
James 20 18
SwePeso I 16 16
SwePeso II 18 20

James, the output using your query is correct. Did not face any problem if there are only LM holds multiple of 10. The only change is (DD.date > t.StartDate and DD.date <= t.EndDate)

Thanks James, SwePeso for your valuable inputs. As of now I see that first case by SwePeso performs better. Do let me know if you find more solutions.

Thanks
Vipin
Go to Top of Page

James K
Flowing Fount of Yak Knowledge

3636 Posts

Posted - 03/25/2013 :  20:38:12  Show Profile  Reply with Quote
quote:
As of now I see that first case by SwePeso performs better
Interesting. I would have thought that SwePeso's second query would perform the best.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.06 seconds. Powered By: Snitz Forums 2000