Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Overlapping Date Ranges - Remove dup Permutations

Author  Topic 

LaurieCox

158 Posts

Posted - 2014-07-22 : 15:39:21
So I have this table (table ddl and data inserts at end of post):

Id ClientId StartDate EndDate
----------- ----------- ---------- ----------
1 208 2013-07-17 2014-07-16
2 208 2013-07-17 2014-07-16
3 21071 2014-05-22 2014-07-14
4 21071 2014-07-14 2015-07-13
5 21071 2014-04-03 2014-08-01
6 29116 2014-05-23 2015-05-22
7 29116 2014-06-12 2015-06-11
8 29116 2014-05-23 2015-05-22
9 62716 2014-06-23 2015-06-22
10 62716 2014-06-08 2015-06-07
11 62716 2014-05-23 2015-05-22
12 90080 2014-05-21 2014-07-02
13 90080 2014-07-03 2015-07-02
14 90080 2014-05-21 2015-05-20

I want to find all pairs of records (for a given ClientId) where the date ranges overlap. I do not count it an overlap if Start Date of one record = End Date of another for the given ClientId.

I have written this query:

SELECT A.ClientId
, A.Id as AId
, B.Id as BId
, A.StartDate as AStartDate
, A.EndDate as AEndDate
, B.StartDate as BStartDate
, B.EndDate as BEndDate
FROM TestDates a
join TestDates b on a.ClientId = b.ClientId
and a.Id <> b.Id
and a.StartDate >= b.StartDate
and a.StartDate < b.EndDate

Which I think works. It finds the overlapping rows for all of the test cases I could think of at least. It gives me this result from the test data (row column added for reference):

row ClientId ARowId BRowId AStartDate AEndDate BStartDate BEndDate
---- ----------- ----------- ----------- ---------- ---------- ---------- ----------
A 208 1 2 2013-07-17 2014-07-16 2013-07-17 2014-07-16
B 208 2 1 2013-07-17 2014-07-16 2013-07-17 2014-07-16
C 21071 3 5 2014-05-22 2014-07-14 2014-04-03 2014-08-01
D 21071 4 5 2014-07-14 2015-07-13 2014-04-03 2014-08-01
E 29116 6 8 2014-05-23 2015-05-22 2014-05-23 2015-05-22
F 29116 7 6 2014-06-12 2015-06-11 2014-05-23 2015-05-22
G 29116 7 8 2014-06-12 2015-06-11 2014-05-23 2015-05-22
H 29116 8 6 2014-05-23 2015-05-22 2014-05-23 2015-05-22
I 62716 9 10 2014-06-23 2015-06-22 2014-06-08 2015-06-07
J 62716 9 11 2014-06-23 2015-06-22 2014-05-23 2015-05-22
K 62716 10 11 2014-06-08 2015-06-07 2014-05-23 2015-05-22
L 90080 12 14 2014-05-21 2014-07-02 2014-05-21 2015-05-20
M 90080 13 14 2014-07-03 2015-07-02 2014-05-21 2015-05-20
N 90080 14 12 2014-05-21 2015-05-20 2014-05-21 2014-07-02

The problem is on some of the overlapping rows it gives me both permutations. For example for ClientId 29116

row ClientId AId BId AStartDate AEndDate BStartDate BEndDate
--- ----------- ----------- ----------- ---------- ---------- ---------- ----------
E 29116 6 8 2014-05-23 2015-05-22 2014-05-23 2015-05-22
F 29116 7 6 2014-06-12 2015-06-11 2014-05-23 2015-05-22
G 29116 7 8 2014-06-12 2015-06-11 2014-05-23 2015-05-22
H 29116 8 6 2014-05-23 2015-05-22 2014-05-23 2015-05-22

Row E and row H are really showing the same overlap of records 6 and 8 so I only want one of them in my result set. Note: I bolded the duplicates in the test data above. I only want one of each pair in my result set. I do not care which one.

I did notice that the ones that give me both permutations the Start Dates are the same. I am not sure if this would always be the case. I first (stupidly) added "and a.StartDate <> b.StartDate" and then immediately realized that that got rid of both rows.

So how to I eliminate the duplicated overlaps?

And though my code worked on all the test cases I could think I am not 100% confident it finds all possible overlaps. Am I missing anything?

Thanks,

Laurie

create table TestDates
(Id int IDENTITY(1, 1)
,ClientId int
,StartDate Date
,EndDate Date)

INSERT INTO TestDates
Select 208, '2013-07-17', '2014-07-16' union all
Select 208, '2013-07-17', '2014-07-16' union all
Select 21071, '2014-05-22', '2014-07-14' union all
Select 21071, '2014-07-14', '2015-07-13' union all
Select 21071, '2014-04-03', '2014-08-01' union all
Select 29116, '2014-05-23', '2015-05-22' union all
Select 29116, '2014-06-12', '2015-06-11' union all
Select 29116, '2014-05-23', '2015-05-22' union all
Select 62716, '2014-06-23', '2015-06-22' union all
Select 62716, '2014-06-08', '2015-06-07' union all
Select 62716, '2014-05-23', '2015-05-22' union all
Select 90080, '2014-05-21', '2014-07-02' union all
Select 90080, '2014-07-03', '2015-07-02' union all
Select 90080, '2014-05-21', '2015-05-20'


SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2014-07-22 : 16:33:36
[code]and a.StartDate <= b.EndDate
and a.EndDate >= b.StartDate[/code]


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2014-07-22 : 16:35:04
Also see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=195286


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

LaurieCox

158 Posts

Posted - 2014-07-23 : 09:58:57
Hi SwePeso,

Thanks for your reply. It does not seem to do what I want. When I add those two statements to my existing query it gives me the same results (i.e. it does not remove one of the duplicate overlaps).

Modified query:

SELECT A.ClientId
, A.Id as ARowId
, B.Id as BRowId
, A.StartDate as AStartDate
, A.EndDate as AEndDate
, B.StartDate as BStartDate
, B.EndDate as BEndDate
FROM TestDates a
join TestDates b on a.ClientId = b.ClientId
and a.Id <> b.Id
and a.StartDate >= b.StartDate
and a.StartDate < b.EndDate
and a.StartDate <= b.EndDate
and a.EndDate >= b.StartDate

Results:

ClientId ARowId BRowId AStartDate AEndDate BStartDate BEndDate
----------- ----------- ----------- ---------- ---------- ---------- ----------
208 2 1 2013-07-17 2014-07-16 2013-07-17 2014-07-16
208 1 2 2013-07-17 2014-07-16 2013-07-17 2014-07-16
21071 3 5 2014-05-22 2014-07-14 2014-04-03 2014-08-01
21071 4 5 2014-07-14 2015-07-13 2014-04-03 2014-08-01
29116 7 6 2014-06-12 2015-06-11 2014-05-23 2015-05-22
29116 8 6 2014-05-23 2015-05-22 2014-05-23 2015-05-22
29116 6 8 2014-05-23 2015-05-22 2014-05-23 2015-05-22
29116 7 8 2014-06-12 2015-06-11 2014-05-23 2015-05-22
62716 9 10 2014-06-23 2015-06-22 2014-06-08 2015-06-07
62716 9 11 2014-06-23 2015-06-22 2014-05-23 2015-05-22
62716 10 11 2014-06-08 2015-06-07 2014-05-23 2015-05-22
90080 14 12 2014-05-21 2015-05-20 2014-05-21 2014-07-02
90080 12 14 2014-05-21 2014-07-02 2014-05-21 2015-05-20
90080 13 14 2014-07-03 2015-07-02 2014-05-21 2015-05-20

The rows spit out in a slightly different order but as you can see it still includes the duplicate (row permutation) overlaps. For example for ClientId 29116 it still includes overlap rows showing 8 overlaps 6 and 6 overlaps 8 (bolded above). As these are functionally the same overlap I only need one in the result set. I do not care which one but I only want one.

Unfortunately I have been juggling some stuff and well not have time this morning to read the thread in the link in thoroughly but I did glance at it and noted that one of posts showed only using the clause you suggested so I ran this query:

SELECT A.ClientId
, A.Id as ARowId
, B.Id as BRowId
, A.StartDate as AStartDate
, A.EndDate as AEndDate
, B.StartDate as BStartDate
, B.EndDate as BEndDate
FROM TestDates a
join TestDates b on a.ClientId = b.ClientId
and a.Id <> b.Id
--and a.StartDate >= b.StartDate
--and a.StartDate < b.EndDate
and a.StartDate <= b.EndDate
and a.EndDate >= b.StartDate

And got this result:

ClientId ARowId BRowId AStartDate AEndDate BStartDate BEndDate
----------- ----------- ----------- ---------- ---------- ---------- ----------
208 2 1 2013-07-17 2014-07-16 2013-07-17 2014-07-16
208 1 2 2013-07-17 2014-07-16 2013-07-17 2014-07-16
21071 4 3 2014-07-14 2015-07-13 2014-05-22 2014-07-14
21071 5 3 2014-04-03 2014-08-01 2014-05-22 2014-07-14
21071 3 4 2014-05-22 2014-07-14 2014-07-14 2015-07-13
21071 5 4 2014-04-03 2014-08-01 2014-07-14 2015-07-13
21071 3 5 2014-05-22 2014-07-14 2014-04-03 2014-08-01
21071 4 5 2014-07-14 2015-07-13 2014-04-03 2014-08-01
29116 7 6 2014-06-12 2015-06-11 2014-05-23 2015-05-22
29116 8 6 2014-05-23 2015-05-22 2014-05-23 2015-05-22
29116 6 7 2014-05-23 2015-05-22 2014-06-12 2015-06-11
29116 8 7 2014-05-23 2015-05-22 2014-06-12 2015-06-11
29116 6 8 2014-05-23 2015-05-22 2014-05-23 2015-05-22
29116 7 8 2014-06-12 2015-06-11 2014-05-23 2015-05-22
62716 10 9 2014-06-08 2015-06-07 2014-06-23 2015-06-22
62716 11 9 2014-05-23 2015-05-22 2014-06-23 2015-06-22
62716 9 10 2014-06-23 2015-06-22 2014-06-08 2015-06-07
62716 11 10 2014-05-23 2015-05-22 2014-06-08 2015-06-07
62716 9 11 2014-06-23 2015-06-22 2014-05-23 2015-05-22
62716 10 11 2014-06-08 2015-06-07 2014-05-23 2015-05-22
90080 14 12 2014-05-21 2015-05-20 2014-05-21 2014-07-02
90080 14 13 2014-05-21 2015-05-20 2014-07-03 2015-07-02
90080 12 14 2014-05-21 2014-07-02 2014-05-21 2015-05-20
90080 13 14 2014-07-03 2015-07-02 2014-05-21 2015-05-20

Which is actually adding more permutations for the matches then the original query had.

Anyway when I have the time I will look more closely at the script in the other thread.

Laurie







Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2014-07-23 : 10:03:24
I've put the date filter on the WHERE clause for brevity
SELECT		A.ClientId,
A.Id as AId,
B.Id as BId,
A.StartDate as AStartDate,
A.EndDate as AEndDate,
B.StartDate as BStartDate,
B.EndDate as BEndDate
FROM dbo.TestDates AS a
INNER JOIN dbo.TestDates AS b ON b.ClientId = a.ClientId
AND b.Id > b.Id
WHERE a.StartDate <= b.EndDate
AND a.EndDate >= b.StartDate;



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

LaurieCox

158 Posts

Posted - 2014-07-23 : 12:36:29
quote:
Originally posted by SwePeso

I've put the date filter on the WHERE clause for brevity
SELECT		A.ClientId,
A.Id as AId,
B.Id as BId,
A.StartDate as AStartDate,
A.EndDate as AEndDate,
B.StartDate as BStartDate,
B.EndDate as BEndDate
FROM dbo.TestDates AS a
INNER JOIN dbo.TestDates AS b ON b.ClientId = a.ClientId
AND b.Id > b.Id
WHERE a.StartDate <= b.EndDate
AND a.EndDate >= b.StartDate;



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA


So I ran your query and got zero rows. But I really had to work on something else so I figured out I would come back to it later.

Then I realized I really needed to it to complete what I was doing and figured I would work on it. So I replaced my two 'on' statements with your where clause and ran it against my real data. I got a bunch of rows.

So I looked more closely at your query and saw that you had changed my a.Id <> b.id to b.Id > b.Id and a light bulb went off.

So I corrected your typo to a.Id > b.Id and it worked. I really should have tried that before I even created this thread because the solution is obvious. I was comparing a to b and b to a. Of course I am going to get a row for each comparison.

Thank you very much,

Laurie
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2014-07-23 : 14:52:51
Great you got it working.


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

LaurieCox

158 Posts

Posted - 2014-08-01 : 09:53:13
It turns out there are open ended ranges (i.e. EndDate is null). So I modified the query to this:

SELECT A.ClientId,
A.Id as AId,
B.Id as BId,
A.StartDate as AStartDate,
A.EndDate as AEndDate,
B.StartDate as BStartDate,
B.EndDate as BEndDate
FROM dbo.TestDates AS a
INNER JOIN dbo.TestDates AS b ON b.ClientId = a.ClientId
AND a.Id > b.Id
WHERE (a.StartDate <= b.EndDate
AND a.EndDate >= b.StartDate)
or (a.EndDate is null and b.EndDate is null)
or (a.EndDate is null and a.StartDate <= b.EndDate)
or (b.EndDate is null and b.StartDate <= a.EndDate)

I ran it against some test cases (see end of post for data) and it seemed to work.

Questions:
I have I missed any test cases?
(because I always want to learn) Is there a better way to check for open ended ranges?

Thanks,

Laurie

Test data:

INSERT INTO TestDates

select 100, '2014-01-01', null union all --overlap
select 100, '2015-01-01', null union all

select 200, '2014-02-05', null union all --overlap
select 200, '2013-01-01','2014-03-01' union all

select 300, '2014-02-05', null union all --no overlap
select 300, '2013-01-01', '2013-03-01' union all

select 400, '2014-01-01', '2015-01-01' union all --overlap
select 400, '2013-01-01', null union all

select 500, '2014-01-01', '2015-01-01' union all --no overlap
select 500, '2015-01-02', null

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2014-08-01 : 11:54:00
[code]SELECT A.ClientId,
A.Id as AId,
B.Id as BId,
A.StartDate as AStartDate,
A.EndDate as AEndDate,
B.StartDate as BStartDate,
B.EndDate as BEndDate
FROM dbo.TestDates AS a
INNER JOIN dbo.TestDates AS b ON b.ClientId = a.ClientId
AND b.Id > a.Id
WHERE a.StartDate <= ISNULL(b.EndDate, '99991231')
AND ISNULL(a.EndDate, '99991231') >= b.StartDate;[/code]


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

LaurieCox

158 Posts

Posted - 2014-08-01 : 13:00:50
Hi SwePeso,

That makes sense. Should have thought of it myself.

Thanks,

Laurie
Go to Top of Page
   

- Advertisement -