Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
cvipin
Yak Posting Veteran
51 Posts |
Posted - 2013-03-22 : 13:08:34
|
Hi,I have following data. There are start and end dates for each hold type. I need to calculate the number of days a hold was placed on a loan. The priority of hold is BK, LM then OTH. Meaning, if I calculate number of days LM hold was placed on loan, I need to subtract any overlapping days with BK. If calculating for OTH, then overlap of LM and BK needs to be checked.DECLARE @Table TABLE( LoanNumber varchar(25), HoldType varchar(3), StartDate date, EndDate date)INSERT INTO @Table VALUES ('1234', 'BK', '8/10/2010', '8/20/2010')INSERT INTO @Table VALUES ('1234', 'BK', '8/25/2010', '9/10/2010')INSERT INTO @Table VALUES ('1234', 'BK', '10/20/2010', '10/30/2010')INSERT INTO @Table VALUES ('1234', 'LM', '7/14/2010', '7/25/2010')INSERT INTO @Table VALUES ('1234', 'LM', '8/14/2010', '9/15/2010')INSERT INTO @Table VALUES ('1234', 'LM', '9/20/2010', '10/25/2010')INSERT INTO @Table VALUES ('1234', 'OTH', '7/10/2010', '7/20/2010')INSERT INTO @Table VALUES ('1234', 'OTH', '8/22/2010', '9/12/2010')INSERT INTO @Table VALUES ('1234', 'OTH', '10/22/2010', '11/30/2010')INSERT INTO @Table VALUES ('5678', 'BK', '8/07/2010', '8/18/2010')INSERT INTO @Table VALUES ('5678', 'BK', '8/24/2010', '9/13/2010')INSERT INTO @Table VALUES ('5678', 'BK', '10/21/2010', '10/31/2010')INSERT INTO @Table VALUES ('5678', 'LM', '7/12/2010', '7/29/2010')INSERT INTO @Table VALUES ('5678', 'LM', '8/14/2010', '9/15/2010')INSERT INTO @Table VALUES ('5678', 'LM', '9/20/2010', '10/25/2010')INSERT INTO @Table VALUES ('5678', 'OTH', '7/10/2010', '7/20/2010')INSERT INTO @Table VALUES ('5678', 'OTH', '8/22/2010', '9/12/2010')INSERT INTO @Table VALUES ('5678', 'OTH', '10/22/2010', '11/30/2010')Output expected:LoanNumber BKCount LMCount OTHCount1234 36 51 355678 41 56 32I implemented the same with the use of DateDim as below, but this has become inefficient as the number of records are in millions.;WITH CTE_All AS( SELECT DISTINCT LoanNumber, HoldType, DD.date FROM @Table T JOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate) SELECT LoanNumber, SUM(BKCount) BKCount, SUM(LMCount) LMCount, SUM(OTHCount) OTHCount FROM ( SELECT LoanNumber, COUNT(1) BKCount, 0 LMCount, 0 OTHCount FROM CTE_All WHERE HoldType = 'BK' GROUP BY LoanNumber UNION ALL SELECT LoanNumber, 0 BKCount, COUNT(1) LMCount, 0 OTHCount FROM ( SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM' EXCEPT SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK' )T GROUP BY LoanNumber UNION ALL SELECT LoanNumber, 0 BKCount, 0 LMCount, COUNT(1) OTHCount FROM ( SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'OTH' EXCEPT SELECT DISTINCT LoanNumber, Date FROM CTE_All WHERE HoldType IN ('BK','LM') )T GROUP BY LoanNumber )T GROUP BY LoanNumberCan you let me know if there is any efficient way to perform this task.ThanksVipin |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2013-03-22 : 16:37:13
|
So are you saying your DateTime has millions of rows? do you have any sort of indexing on DD.Date?instead of CTE and table variables use #temptables with indexes. this can improve your performance. also run a small sample of your query using execution plan to sniff out issueswhat do you gain by using DimDateSELECT DISTINCT LoanNumber, HoldType, DD.dateFROM @Table TJOIN SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDateI do not see any benefits other than grabbing dd.date which you already have in T. that part makes no sense.<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
|
|
cvipin
Yak Posting Veteran
51 Posts |
Posted - 2013-03-22 : 18:33:17
|
DateDim contains daily dates from 1/1/1992 till current year end. Using SDM.DateDim DD WITH(NOLOCK) ON DD.Date > T.StartDate AND DD.Date <= T.EndDate, I get daily dates for that range and it becomes easier to remove any overlap.Ex: For BK: '8/10/2010', '8/20/2010', with above join I will get 10 rows for 8/11/2010, 8/12/2010,....8/20/2010For LM: '7/14/2010', '7/25/2010' I get 7/15/2010, 7/16/2010....7/25/2010. So for one date range my logic creates those many rows (10 and 11 in this ex.). Now to get net LM days I use except to remove any overlapping days.SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'LM' EXCEPT SELECT LoanNumber, Date FROM CTE_All WHERE HoldType = 'BK'I have lot of loans with date ranges and my logic is causing the record count go so high (25 millions).Let me know if there is any efficient way to tackle this requirement. I am using # table in my actual query.ThanksVipin |
|
|
James K
Master Smack Fu Yak Hacker
3873 Posts |
Posted - 2013-03-22 : 18:41:55
|
Here somthing to get you started. It parses, but I have not tested it. I know it has a at least one problem - that it will give you the wrong results if the only type of hold was LM , and the number of days of the hold happened to be a multiple of 10. That logic can be fixed, and if it works correctly for other cases, it should be faster than the CTE approachselect LoanNumber, SUM(case when N%10 > 0 then 1 else 0 end) as BKCount, SUM(case when N%10 = 0 and N%100 > 0 then 1 else 0 end) as LMCount, SUM(case when N%100 = 0 then 1 else 0 end ) as OTHCountfrom( select LoanNumber, DD.date, SUM(CASE WHEN HoldType = 'BK'then 1 else 0 End + case when HoldType = 'LM' then 10 else 0 end + case when HoldType = 'OTH' then 100 else 0 end) N from SDM.DateDim DD inner join @Table t on DD.date >= t.StartDate and DD.date < t.EndDate group by LoanNumber, DD.date) s Group by LoanNumberorder by LoanNumber; |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-03-23 : 06:05:51
|
Two approaches-- Case 1 (triple pre-aggregations)WITH cteSource(LoanNumber, BK, LM, OTH)AS ( SELECT t.LoanNumber, MAX(CASE WHEN t.HoldType = 'BK' THEN 1 ELSE 0 END) AS BK, MAX(CASE WHEN t.HoldType = 'LM' THEN 1 ELSE 0 END) AS LM, MAX(CASE WHEN t.HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTH FROM @Table AS t INNER JOIN SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate AND dd.[Date] <= t.EndDate GROUP BY t.LoanNumber, dd.[Date])SELECT LoanNumber, SUM(BK) AS BKCount, SUM(CASE WHEN BK = 1 THEN 0 ELSE LM END) AS LMCount, SUM(CASE WHEN BK = 1 THEN 0 WHEN LM = 1 THEN 0 ELSE OTH END) AS OTHCountFROM cteSourceGROUP BY LoanNumber;-- Case 2 (single pre-aggregation)WITH cteSource(LoanNumber, HoldType)AS ( SELECT t.LoanNumber, MIN(t.HoldType) AS HoldType FROM @Table AS t INNER JOIN SDM.DateDim AS dd WITH (NOLOCK) ON dd.[Date] > t.StartDate AND dd.[Date] <= t.EndDate GROUP BY t.LoanNumber, dd.[Date])SELECT LoanNumber, SUM(CASE WHEN HoldType = 'BK' THEN 1 ELSE 0 END) AS BKCount, SUM(CASE WHEN HoldType = 'LM' THEN 1 ELSE 0 END) AS LMCount, SUM(CASE WHEN HoldType = 'OTH' THEN 1 ELSE 0 END) AS OTHCountFROM cteSourceGROUP BY LoanNumber; N 56°04'39.26"E 12°55'05.63" |
|
|
cvipin
Yak Posting Veteran
51 Posts |
Posted - 2013-03-25 : 19:01:52
|
Server: 32 CPU, 128 GB RAMTotal Loans: 100,810Total # of holds on all loans: 298,016 (means 298,016 date ranges)I executed all the scripts one by one, below are the results. (Time in seconds. This includes collecting list of loans with holds.)Solution By First Run Second RunVipin 24 18James 20 18SwePeso I 16 16SwePeso II 18 20James, the output using your query is correct. Did not face any problem if there are only LM holds multiple of 10. The only change is (DD.date > t.StartDate and DD.date <= t.EndDate)Thanks James, SwePeso for your valuable inputs. As of now I see that first case by SwePeso performs better. Do let me know if you find more solutions.ThanksVipin |
|
|
James K
Master Smack Fu Yak Hacker
3873 Posts |
Posted - 2013-03-25 : 20:38:12
|
quote: As of now I see that first case by SwePeso performs better
Interesting. I would have thought that SwePeso's second query would perform the best. |
|
|
|
|
|
|
|