Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Best way to map timespans to calendardays

Author  Topic 

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 08:23:16
Hi SQL-Experts,

i am confronted with a problem which i guess is already solved, but i do not now how to search for it or the best way to solve it, so i registered in the hope to find some nerd-expert-answers.

I have:
about 10,000,000 entries [car_id], [starttime], [endtime] (between 2011 and today)

I need:
A calendar list from 2011 until today, which contains the daily time a car is being used and which [car_id] is involved for each day (might be multiple times a day).
[date_id] [car_id] [starttime] [endtime]

Due to bad DB design i have about 12,000 different [car_id] which i cannot reduce to a lower number.

So the expert-sql-question is:
Does anyone have a good script solution to map a timespan over different calendar days to a one by one list of daily timespans?

My current way would be to scan each [car_id] in time-order and break it down into days and insert the results into a new table or to scan day by day all entries with starttime on that day and process this list via a #tempdb
anyway it will result in about 50,000,000 inserts.

What do you suggest?

Thank you very much, for even reading until here and even more having an answer.

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 08:49:22
Would you please post some examples of input and output?
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 10:08:42
quote:
Originally posted by gbritton

Would you please post some examples of input and output?



Yes, of course

available source-data

car_id starttime endtime duration
64 2013-05-23 13:36:42.000 2013-05-25 04:10:51.000 138849
103 2013-05-23 20:38:43.000 2013-05-25 03:40:55.000 111732
104 2013-05-23 14:37:41.000 2013-05-25 04:10:02.000 135141
171 2013-05-23 11:32:21.000 2013-05-25 04:07:20.000 146099
189 2013-05-23 16:33:27.000 2013-05-27 11:05:44.000 325937
3283 2013-05-23 17:12:55.000 2013-05-25 14:53:57.000 164462
3956 2013-05-23 09:49:20.000 2013-05-24 10:29:51.000 88831
4107 2013-05-23 07:20:20.000 2013-05-24 09:35:09.000 94489
5237 2013-05-23 13:16:20.000 2013-05-25 00:49:04.000 127964
13813 2013-05-23 12:47:57.000 2013-05-24 15:17:29.000 95372
14265 2013-05-23 11:24:06.000 2013-05-24 13:32:00.000 94074
30829 2013-05-23 08:16:23.000 2013-05-24 10:13:40.000 93437
30832 2013-05-23 05:58:36.000 2013-05-24 06:08:28.000 86992


Well, i do not have result data yet. I expect the first row to result in the following entries:

[car_id][date_id] [starttime] [endtime]
64 20130523 13:36:42.000 23:59:59.000
64 20130524 00:00:00.000 23:59:59.000
64 20130525 00:00:00.000 04:10:51.000



Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 11:15:47
Tricky! Check this out:


declare @t table (car_id int, starttime datetime, endtime datetime, duration int)
insert into @t(car_id, starttime, endtime, duration) values

(64 ,'2013-05-23 13:36:42.000', '2013-05-25 04:10:51.000', 138849 ),
(103 ,'2013-05-23 20:38:43.000', '2013-05-25 03:40:55.000', 111732 ),
(104 ,'2013-05-23 14:37:41.000', '2013-05-25 04:10:02.000', 135141 ),
(171 ,'2013-05-23 11:32:21.000', '2013-05-25 04:07:20.000', 146099 ),
(189 ,'2013-05-23 16:33:27.000', '2013-05-27 11:05:44.000', 325937 ),
(3283 ,'2013-05-23 17:12:55.000', '2013-05-25 14:53:57.000', 164462 ),
(3956 ,'2013-05-23 09:49:20.000', '2013-05-24 10:29:51.000', 88831 ),
(4107 ,'2013-05-23 07:20:20.000', '2013-05-24 09:35:09.000', 94489 ),
(5237 ,'2013-05-23 13:16:20.000', '2013-05-25 00:49:04.000', 127964 ),
(13813 ,'2013-05-23 12:47:57.000', '2013-05-24 15:17:29.000', 95372 ),
(14265 ,'2013-05-23 11:24:06.000', '2013-05-24 13:32:00.000', 94074 ),
(30829 ,'2013-05-23 08:16:23.000', '2013-05-24 10:13:40.000', 93437 ),
(30832 ,'2013-05-23 05:58:36.000', '2013-05-24 06:08:28.000', 86992 )

select car_id, dts.d, dts.s, dts.e from @t

cross apply (
select cast(starttime as date) sd
, cast(starttime as time) st
, cast(endtime as date) ed
, cast(endtime as time) et
, cast('23:59:59' as time)
, cast('00:00:00' as time)) dt(sd,st,ed,et, _1159, _0000)

cross apply (
select distinct dt, s, e from (values
(dt.sd, dt.st, dt._1159),
(dateadd(day, 1, dt.sd), dt._0000, dt._1159),
(dateadd(day, 2, dt.sd), dt._0000, dt._1159),
(dateadd(day, 3, dt.sd), dt._0000, dt._1159),
(dateadd(day, 4, dt.sd), dt._0000, dt._1159),
(dateadd(day, 5, dt.sd), dt._0000, dt._1159),
(dateadd(day, 6, dt.sd), dt._0000, dt._1159),
(dt.ed, dt._0000, dt.et)) d(dt, s, e)
where 1 = case
when d.dt < dt.ed then 1
when d.dt = dt.ed and d.e <= dt.et then 1
else 0
end
) dts(d, s, e)


Edit: removed unused code. Note: if you want to cover any number of days, you'll need to build the query as dynamic SQL, for the last CROSS APPLY step.
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 12:18:01
Here's another way, using a CTE tally table. This should handle any interval up to 10^8 days"


declare @t table (car_id int, starttime datetime, endtime datetime, duration int)
insert into @t(car_id, starttime, endtime, duration) values

(64 ,'2013-05-23 13:36:42.000', '2013-05-25 04:10:51.000', 138849 ),
(103 ,'2013-05-23 20:38:43.000', '2013-05-25 03:40:55.000', 111732 ),
(104 ,'2013-05-23 14:37:41.000', '2013-05-25 04:10:02.000', 135141 ),
(171 ,'2013-05-23 11:32:21.000', '2013-05-25 04:07:20.000', 146099 ),
(189 ,'2013-05-23 16:33:27.000', '2013-05-27 11:05:44.000', 325937 ),
(3283 ,'2013-05-23 17:12:55.000', '2013-05-25 14:53:57.000', 164462 ),
(3956 ,'2013-05-23 09:49:20.000', '2013-05-24 10:29:51.000', 88831 ),
(4107 ,'2013-05-23 07:20:20.000', '2013-05-24 09:35:09.000', 94489 ),
(5237 ,'2013-05-23 13:16:20.000', '2013-05-25 00:49:04.000', 127964 ),
(13813 ,'2013-05-23 12:47:57.000', '2013-05-24 15:17:29.000', 95372 ),
(14265 ,'2013-05-23 11:24:06.000', '2013-05-24 13:32:00.000', 94074 ),
(30829 ,'2013-05-23 08:16:23.000', '2013-05-24 10:13:40.000', 93437 ),
(30832 ,'2013-05-23 05:58:36.000', '2013-05-24 06:08:28.000', 86992 )


declare @d int = 1+ (
select max(datediff(day, cast(starttime as date), cast(endtime as date)))
from @t);

with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)),
n2(n) as (select 1 from n1, n1 n),
n4(n) as (select 1 from n2, n2 n),
n8(n) as (select 1 from n4, n4 n),
N(n) as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n)

select car_id, dateadd(day, n-1,dt.sd) startdate
,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime
,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtime
from @t
cross apply (
select cast(starttime as date), cast(starttime as time),
cast(endtime as date), cast(endtime as time),
cast('23:59:59' as time), cast('00:00:00' as time)
) dt(sd, st, ed, et, _1159, _0000)
cross join N
where n -1 <= datediff(day, dt.sd, dt.ed)
order by car_id, startdate
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 12:26:24
quote:
Originally posted by gbritton

Tricky! Check this out:



WOW! THANX SO MUCH. I have learned so much right now. There has opened a whole new SQL-World for me with DTS. I did not know of DTS.

But there is a minor drawback with rows where the car is returned on the same day.

If you change above code the first row of data to:

(64 ,'2013-05-23 13:36:42.000', '2013-05-23 14:10:51.000', 2049 ),


the returned results leave the first day with a wrong starttime from 00:00:00

I guess it is a minor typo, but as i do not understand what is going on, i will and have to train myself first to find it. Do you see it right away ?

Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 13:08:58
My second solution fixes that
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 13:13:01
quote:
Originally posted by gbritton

My second solution fixes that



YES, it does !!
I already implemented your solution into my database and processed all data in 1 minute and 6 seconds .

I will go asleep tonight a different person.
And tomorrow i will use the saved time to understand what you have made me do. ;-)

THANX, THANX, THANX.
Go to Top of Page

Muj9
Yak Posting Veteran

75 Posts

Posted - 2014-12-18 : 07:10:42
your a sql beast "gbritton".

Can you walk us through your sql ?

what do all the n's mean?

Thanks
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-18 : 09:04:42
@Muj9 Ha! I claim no credit here. All the n's are a compact form of a CTE-based tally table. Jeff Moden wrote a great article on that and how to use a tally table to parse strings:

http://www.sqlservercentral.com/articles/Tally+Table/72993/

The basic idea is to build up a sequence of enough numbers to cover what you want to do. Looking at the CTEs again:


with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)),
n2(n) as (select 1 from n1, n1 n),
n4(n) as (select 1 from n2, n2 n),
n8(n) as (select 1 from n4, n4 n),
N(n) as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n)


n1 is just 10 rows of the number 1. I call it n1 since there are 10^1 = 10 rows.
n2 is n1*n1 -- that is, the cartesian product of n1 with itself. Call it n1^2 if you like. Now we have 10^1 * 10^1 = 10^2 = 100 rows
n4 continues the idea, resulting in 10^4 = 10,000 rows
n8 has 10^8 rows

Finally we come to big N. I use N since it reminds me of the double-struck N, the symbol used to represent the natural numbers in mathematics (see http://en.wikipedia.org/wiki/Natural_number). This CTE selects some of the rows of n8. In this case the number of rows is controlled by the number of days (inclusive) between the earliest start date and latest end date. (In practice I suspect this will be (a lot!) shorter than 10^8). This CTE uses the SQL ROW_NUMBER function to generate the sequence we want. ORDER BY is required and ORDER BY (SELECT 1) is equivalent to no ordering. You could say (SELECT NULL) or (SELECT 0) or (SELECT $) or (SELECT 'UNORDERED') with the same result.

Continuing on, the first CROSS APPLY:


cross apply (
select cast(starttime as date), cast(starttime as time),
cast(endtime as date), cast(endtime as time),
cast('23:59:59' as time), cast('00:00:00' as time)
) dt(sd, st, ed, et, _1159, _0000)


just does typecasts and aliases the results for easy reference. Note that using CROSS APPLY this way, the optimizer will do the work in-line and only adds a minuscule constant factor to the overall cost. However, it makes the main SELECT clause simpler to write.

The CROSS JOIN just ensures that I have enough sequence numbers for each row of input. (So I can cover each day).

The main SELECT clause:


select car_id, dateadd(day, n-1,dt.sd) startdate
,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime
,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtime


gets the car id and start date. Then it computes the start and end times. The idea is that if this is day 1 (dt.sd = dateadd(day, n-1,dt.sd)), the start time comes from the input data, otherwise it is midnight (00:00:00). Same idea with the end time.

PS: I just realized that I can simplify this even further and make it DRY-er by reordering the CROSS JOIN and CROSS APPLY and pre-computing dateadd(day, n-1,dt.sd):


select car_id, day_n startdate
,case when dt.sd = day_n then dt.st else dt._0000 end startime
,case when dt.ed = day_n then dt.et else dt._1159 end endtime
from @t
cross join N
cross apply (
select cast(starttime as date), cast(starttime as time),
cast(endtime as date), cast(endtime as time),
cast('23:59:59' as time), cast('00:00:00' as time),
cast(dateadd(day, n-1,starttime) as date)
) dt(sd, st, ed, et, _1159, _0000, day_n)

where n -1 <= datediff(day, dt.sd, dt.ed)
order by car_id, startdate
Go to Top of Page

Muj9
Yak Posting Veteran

75 Posts

Posted - 2014-12-18 : 09:23:25
Thank you for the explanation its very impressive. Just Brilliant.
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-08 : 10:22:45
quote:
Originally posted by gbritton

My second solution



Thank you again. I studied a while on it, i tested a few different szenarios and learned very much. The routine is implemented and working stable and fine.

And i will build a training about "CTE" and "cross apply" out of it for my colleagues.

Thanks a lot.
(I was of on x-mas holidays and now as i am back, otherwise i would have answered earlier)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-08 : 12:45:23
Keep it simple. I expect this code to run much faster than 66 seconds.
-- SwePeso
SELECT t.Car_ID,
DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.Number) AS Date_ID,
CASE
WHEN v.Number = 0 THEN CAST(t.StartTime AS TIME(3))
ELSE CAST('00:00:00.000' AS TIME(3))
END AS StartTime,
CASE
WHEN v.Number = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3))
ELSE CAST('23:59:59.000' AS TIME(3))
END AS EndTime
FROM @t AS t
INNER JOIN master.dbo.spt_values AS v ON v.Type = N'P'
AND v.Number BETWEEN 0 AND DATEDIFF(DAY, t.StartTime, t.EndTime);



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-08 : 12:59:58
Even faster.

First create this function in your database.
CREATE FUNCTION [dbo].[GetNums](@low AS BIGINT, @high AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),
L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L5)
SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
FROM Nums
ORDER BY rownum;
GO
And then run this code
-- SwePeso
SELECT t.Car_ID,
DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.n) AS Date_ID,
CASE
WHEN v.n = 0 THEN CAST(t.StartTime AS TIME(3))
ELSE CAST('00:00:00.000' AS TIME(3))
END AS StartTime,
CASE
WHEN v.n = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3))
ELSE CAST('23:59:59.000' AS TIME(3))
END AS EndTime
FROM @t AS t
CROSS APPLY dbo.GetNums(0, DATEDIFF(DAY, t.StartTime, t.EndTime)) AS v;


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page
   

- Advertisement -