| Author |
Topic |
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-07-31 : 16:21:53
|
Ok, I have a cursor based on a query that returns datetime stamps from a table (StartDate, EndDate).DECLARE @MergedDates TABLE (StartTime DATETIME, EndTime DATETIME);DECLARE @StartTime DATETIME;DECLARE @EndTime DATETIME;DECLARE curDates CURSOR LOCAL FAST_FORWARDFORSELECT StartDate, EndDateFROM <database>.<schema>.<table>WHERE YEAR(StartDate) = 2007ORDER BY StartDate, EndDate;OPEN curDates;FETCH curDates INTO @StartTime, @EndTime;WHILE (@@FETCH_STATUS = 0) BEGIN IF EXISTS(SELECT TOP 1 1 FROM @MergedDates WHERE @StartTime BETWEEN StartTime AND EndTime) BEGIN UPDATE @MergedDates SET EndTime = @EndTime WHERE @StartTime BETWEEN StartTime AND EndTime /* If the end time is less then we update it */ AND EndTime < @EndTime; END; ELSE BEGIN INSERT INTO @MergedDates (StartTime, EndTime) VALUES (@StartTime, @EndTime); END FETCH curDates INTO @StartTime, @EndTime; END;CLOSE curDates;DEALLOCATE curDates; I have data in this table like so (StartDate, EndDate)2008-01-04 11:00:00.000 2008-01-04 14:00:00.0002008-01-04 11:30:00.000 2008-01-04 14:00:00.0002008-01-04 11:45:00.000 2008-01-04 14:30:00.0002008-01-04 16:00:00.000 2008-01-04 18:00:00.0002008-01-05 16:00:00.000 2008-01-05 18:00:00.0002008-01-06 16:00:00.000 2008-01-06 18:00:00.000I want to query it so that any records that overlap entirely are merged into one record. With the above data I would want this returned:2008-01-04 11:00:00.000 2008-01-04 14:30.00.0002008-01-04 16:00:00.000 2008-01-04 18:00:00.0002008-01-05 16:00:00.000 2008-01-05 18:00:00.0002008-01-06 16:00:00.000 2008-01-06 18:00:00.000Now the query above does the trick but I would like to try a solution that does not use cursors. I played a little bit with recursive CTEs but no luck so far.Any suggestions? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2008-07-31 : 18:03:26
|
| [code]DECLARE @MergedDates TABLE (StartTime DATETIME, EndTime DATETIME);Insert Into @MergedDatesselect '2008-01-04 11:00:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:30:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:45:00.000', '2008-01-04 14:30:00.000' Union Allselect '2008-01-04 16:00:00.000', '2008-01-04 18:00:00.000' Union Allselect '2008-01-05 16:00:00.000', '2008-01-05 18:00:00.000' Union Allselect '2008-01-06 16:00:00.000', '2008-01-06 18:00:00.000'select distinct Min(b.StartTime),Max(b.EndTime) from(Select Row_Number() over (order by a.StartTime)as RowID,*from@MergedDates a) aainner Join@MergedDates bon aa.Starttime between b.StartTime and b.Endtimeor aa.Endtime between b.StartTime and b.Endtimegroup by aa.RowID[/code] |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-08-01 : 04:01:17
|
Try this test dataselect '20080104 11:00', '20080104 14:00' Union Allselect '20080104 13:00', '20080104 17:00' Union Allselect '20080104 15:00', '20080104 18:00' E 12°55'05.25"N 56°04'39.16" |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 09:40:26
|
quote: Originally posted by Vinnie881
DECLARE @MergedDates TABLE (StartTime DATETIME, EndTime DATETIME);Insert Into @MergedDatesselect '2008-01-04 11:00:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:30:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:45:00.000', '2008-01-04 14:30:00.000' Union Allselect '2008-01-04 16:00:00.000', '2008-01-04 18:00:00.000' Union Allselect '2008-01-05 16:00:00.000', '2008-01-05 18:00:00.000' Union Allselect '2008-01-06 16:00:00.000', '2008-01-06 18:00:00.000'select distinct Min(b.StartTime),Max(b.EndTime) from(Select Row_Number() over (order by a.StartTime)as RowID,*from@MergedDates a) aainner Join@MergedDates bon aa.Starttime between b.StartTime and b.Endtimeor aa.Endtime between b.StartTime and b.Endtimegroup by aa.RowID
This is it. Nice, simple and it works! Thanks! I'm going to dissect this a little to understand why it does exactly but this is perfect. I've ran this against my real data and manually compared the results to the raw data and this accounts for all merge scenarios perfectly. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-08-01 : 10:10:27
|
Even when you try with my posted sample data 08/01/2008 : 04:01:17 ?quote:
select '20080104 11:00', '20080104 14:00' Union Allselect '20080104 13:00', '20080104 17:00' Union Allselect '20080104 15:00', '20080104 18:00'
The result should beselect '20080104 11:00', '20080104 18:00' right? E 12°55'05.25"N 56°04'39.16" |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 10:59:19
|
quote: Originally posted by Peso Even when you try with my posted sample data 08/01/2008 : 04:01:17 ?quote:
select '20080104 11:00', '20080104 14:00' Union Allselect '20080104 13:00', '20080104 17:00' Union Allselect '20080104 15:00', '20080104 18:00'
The result should beselect '20080104 11:00', '20080104 18:00' right? E 12°55'05.25"N 56°04'39.16"
Yes, you are correct. I'll tinker with this a bit. |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 11:20:00
|
I see what is going on and I think it can be solved by a recursive CTE. Take your base data:Row #1: '2008-01-04 11:00:00.000', '2008-01-04 14:00:00.000'Row #2: '2008-01-04 13:00:00.000', '2008-01-04 17:00:00.000'Row #3: '2008-01-04 15:00:00.000', '2008-01-04 18:00:00.000' The results are:Row #A: '2008-01-04 11:00:00.000', '2008-01-04 17:00:00.000'Row #B: '2008-01-04 11:00:00.000', '2008-01-04 18:00:00.000'Row #C: '2008-01-04 13:00:00.000', '2008-01-04 18:00:00.000' The results are showing rows row #1 & #2 merged, #1 & #3 merged, and #2 and #3 merged.I'll post the results in a bit after I toy with this. |
 |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2008-08-01 : 12:29:44
|
I just verified there are errors in this query as well. Use a CTE, it'll be easier, but anyway here is where I am at. It works in MOST scenerios but not all.select Min(aaaa.Start) as Start,aaaa.[End]From( select aaa.Start,max(bbb.[EndTime]) as [End] from ( select RowID,Min(b.StartTime) as start,Max (b.EndTime) as [endTime] from ( Select Row_Number() over (order by a.StartTime)as RowID,* from @MergedDates a ) aaInner Join @MergedDates b on aa.Starttime between b.StartTime and b.Endtime or aa.Endtime between b.StartTime and b.Endtime group by RowID ) aaa Inner Join @MergedDates bbb on aaa.Start between bbb.StartTime and bbb.Endtime or aaa.[EndTime] between bbb.StartTime and bbb.Endtimegroup by Start) aaaagroup by [End] It's a mess to look at and you'll probably want to clean it up a little, but see if it relieves the issue. |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 13:13:54
|
| @Vinnie881This does appear to correct this issue with the sample data Peso posted as well as the sample data I posted. I'm curious though, what situations do you think this will not work for (your latest query)? |
 |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2008-08-01 : 13:26:17
|
| DECLARE @MergedDates TABLE (StartTime DATETIME, EndTime DATETIME);Insert Into @MergedDatesselect '2008-01-04 11:00:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:30:00.000', '2008-01-04 14:00:00.000' Union Allselect '2008-01-04 11:45:00.000', '2008-01-04 14:30:00.000' Union Allselect '2008-01-04 16:00:00.000', '2008-01-04 18:00:00.000' Union Allselect '2008-01-05 16:00:00.000', '2008-01-05 18:00:00.000' Union Allselect '2008-01-06 16:00:00.000', '2008-01-06 18:00:00.000' Union All select '20080104 11:00', '20080104 14:00' Union Allselect '20080104 13:00', '20080104 17:00' Union Allselect '20080104 15:00', '20080104 18:00' Union Allselect '20080104 18:00', '20080106 18:00' Union allselect '20080106 15:00', '20080108 18:00' that last record above will throw it off. Peso is one of the most knowledgeable individuals on this site (if not the most) when it comes to this type of question. If he suggested a CTE, he's likely accurate in his reasoning. So far I have not found a 100% method using the methodolgy I illustrated. I am thinking of an approach with a initial cross join then start the filtering from there might work, but a CTE is a easer method. |
 |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 13:33:01
|
| Actually, I brought up the topic of using some sort of recursive CTE but I just can't figure it out yet. Peso's examples both utilized nested queries such as yours. I haven't quite figured out how to convert his examples to my approach but with the help of both of you I am getting closer.I'll keep you posted. |
 |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2008-08-01 : 15:37:00
|
| I tested the methodology of Creating a table with every date by second between the two dates. It works but runs extrordianry slow that a while statement is much quicker.I am interested to know the solution you come up with. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-08-01 : 16:51:30
|
quote: Originally posted by Peso Use the "clustered index update" trick as seen herehttp://weblogs.sqlteam.com/peterl/archive/2008/05/13/Lightning-fast-collapsed-date-ranges-and-missing-date-ranges.aspxquote: For 1,000 date pairs the algorithm runs in 80 ms.For 10,000 date pairs the algorithm runs in 360 ms.For 100,000 date pairs the algorithm runs in 900 ms.For 1,000,000 date pairs the algorithm runs in 2250 ms.
E 12°55'05.25"N 56°04'39.16"
I have been looking at this since you posted it originally and I have to admit, I'm having a tough time translating it to my requirement. Any chance you would be able to shed some light on this? |
 |
|
|
|