Create a 'benchmark' table from Start End dates

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Create a 'benchmark' table from Start End dates

Author

Topic

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-02-11 : 05:15:45

I have a table containing activity with a start and end datetime for a 1 year period (see: [dbo].[Activity] for sample data)
This contains 70,000 records over the 1 year.

My aim is to use this data to work out trends in activity – which relates to the Quarter – Day of Week and Hour of day.
e.g. On Winter, Mondays we are busiest between the hours of 9 and 12 and Very quiet on Summer, Sundays between 21 and 23 hours.

I want to use the historical data to generate a ‘benchmark’ table – which can be used for comparisons.
I envisage that the ‘benchmark’ table will end up with 672 rows – 4 (quarters) x 7 (Days of Week) x 24 Hours in the day.

The Live Activity can then ‘reference’ (inner - join) to the benchmark table on the Live Quarter, Live Day of Week and Live Hour of Day – and provide me with a comparison to the typical Average.

I assure this is not a homework question!!!! (though it possibly sounds like one!!) – I’m just not sure about the best way to go about it (without writing a load of Update Statements!!!)

I have used this forum before and been amazed with the quality of replies – and I was hoping somebody would understand what I want …and have the Mathematical and SQL brain to work out how to do it.

Any help will be really appreciated

See code below;

IF OBJECT_ID( N'Activity', N'U') IS NOT NULL
drop table [dbo].[Activity]

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Activity](
[Startdt] [datetime] NULL,
[Enddt] [datetime] NULL
) ON [PRIMARY]

Insert into Activity
select '2008-04-19 08:23:00.000','2008-04-19 12:30:00.000' UNION all
select '2008-04-19 08:53:00.000','2008-04-19 11:21:00.000' UNION all
select '2008-04-19 08:59:00.000','2008-04-19 18:22:00.000' UNION all
select '2008-04-19 09:21:00.000','2008-04-19 16:22:00.000' UNION all
select '2008-04-19 11:20:00.000','2008-04-19 12:20:00.000' UNION all
select '2008-04-19 11:33:00.000','2008-04-19 13:22:00.000' UNION all
select '2008-04-19 12:00:00.000','2008-04-19 16:11:00.000' UNION all
select '2008-04-19 12:31:00.000','2008-04-19 17:10:00.000'

select * from Activity

IF OBJECT_ID( N'ActivityTrend', N'U') IS NOT NULL
drop table [dbo].[ActivityTrend]

CREATE TABLE [dbo].[ActivityTrend](
[Qtr] [int] NULL,
[DayOfWeek] [int] NULL,
[Hour] [int] NULL,
[count] [int] NULL
) ON [PRIMARY]

insert into [ActivityTrend]
select '2','7','8','3' UNION all
select '2','7','9','4' UNION all
select '2','7','10','4' UNION all
select '2','7','11','5' UNION all
select '2','7','12','6' UNION all
select '2','7','13','3' UNION all
select '2','7','14','3' UNION all
select '2','7','15','4' UNION all
select '2','7','16','4' UNION all
select '2','7','17','2' UNION all
select '2','7','18','1' UNION all
select '2','7','19','0'

--table type i would like to create
select * from ActivityTrend

IF OBJECT_ID( N'ActivityTrend', N'U') IS NOT NULL
drop table [dbo].[ActivityTrend]

IF OBJECT_ID( N'Activity', N'U') IS NOT NULL
drop table [dbo].[Activity]

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-02-11 : 07:45:56

So from what I understand you actually want to populate the ActivityTrend table with data aggregated from the Activity table right? I would alter the ActivityTrend table slightly but hopefully this is what you want:

CREATE TABLE [dbo].[ActivityTrend](
  [Date] datetime NOT NULL,
  [Year] [int] NOT NULL,
  [Quarter] [int] NOT NULL,
  [Month] [int] NOT NULL,
  [Week] [int] NOT NULL,
  [DayOfWeek] [int] NOT NULL,
  [Day] [int] NOT NULL,
  [Hour] [int] NOT NULL,
  [Count] [int] NULL
) ON [PRIMARY]

DECLARE @myDatetime datetime
SET @myDatetime = '2008-01-01 00:000:00'

WHILE (@myDatetime < '2009-01-01 00:000:00')
  BEGIN
    INSERT INTO ActivityTrend (
      [Date], [Year], [Quarter],[Month],
      [Week], [DayOfWeek],[Day],[Hour])
    SELECT @myDatetime, DATEPART(yyyy, @myDatetime), DATEPART(q, @myDatetime), DATEPART(mm, @myDatetime), 
      DATEPART(wk, @myDatetime), DATEPART(dw, @myDatetime), DATEPART(dd, @myDatetime), DATEPART(hh, @myDatetime)

    SET @myDatetime = DATEADD(hh, 1, @myDatetime)
  END

UPDATE c 
  SET c.[Count] = dt.[Count]
FROM ActivityTrend c
  INNER JOIN (
      SELECT b.[Date], [Count] = COUNT(*)
      FROM Activity a
        INNER JOIN ActivityTrend b
          ON b.[Date] BETWEEN a.StartDt AND a.EndDt
      GROUP BY b.[Date]) AS dt
    ON c.[Date] = dt.[Date]

SELECT * FROM ActivityTrend WHERE [Count] IS NOT NULL

- Lumbago

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-02-11 : 08:28:38

Amazing! - it works perfectly.

You hit the nail on the head - this is what I wanted - only MORE powerful as I can generate better aggregates than my original table.

Thank you for your help - I will ensure you get the credit in my final Code.

Thanks again.

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-02-11 : 08:39:30

Great

But did you understand the solution? It's actually extremely simple as long as you add all the datetimes prior to doing the actual count, and by doing this you will also get timelots when there are no activities which also is valuable information. And by adding Year to the trend table you can actually keep on collecting data and compare this years data to last, etc.

- Lumbago

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-02-11 : 09:44:12

Absolutely, (after a little reading of course :-))

Its a great solution - The addition of ALL datetimes makes it much more robust & the year allows for addition time periods (and i will be able to get better benchmark data :-) )

The solution you have given will work for me on a number of datasets - and i can see it being used as a 'plug-in' for lots of activity with start and end times!

I have already showed it to my coleagues who are really impressed with your work!

Thanks

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-02-11 : 09:54:50

Really great that it works for you, and thanks for the kind words :)

- Lumbago

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-02-11 : 10:20:25

I think your ActivityTrend is faulty, at least with provided sample data.

Is this what you want?

SELECT '20080419 08:23', '20080419 12:30' UNION ALL	-- 8, 9, 10, 11, 12
SELECT '20080419 08:53', '20080419 11:21' UNION ALL	-- 8, 9, 10, 11
SELECT '20080419 08:59', '20080419 18:22' UNION ALL	-- 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18
SELECT '20080419 09:21', '20080419 16:22' UNION ALL	-- 9, 10, 11, 12, 13, 14, 15, 16
SELECT '20080419 11:20', '20080419 12:20' UNION ALL	-- 11, 12
SELECT '20080419 11:33', '20080419 13:22' UNION ALL	-- 11, 12, 13
SELECT '20080419 12:00', '20080419 16:11' UNION ALL	-- 12, 13, 14, 15, 16
SELECT '20080419 12:31', '20080419 17:10'		-- 12, 13, 14, 15, 16, 17

 Hour  Me  You
--  8	3    3
--  9	4    4
-- 10	4    4	
-- 11	6    5 <- Difference of 1
-- 12	7    6 <- Difference of 1
-- 13	5    3 <- Difference of 2
-- 14	4    3 <- Difference of 1
-- 15	4    4
-- 16	4    4
-- 17	2    2
-- 18	1    1

E 12°55'05.63"
N 56°04'39.26"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-02-11 : 10:23:52

Try this set-based solution. No loops, no cursors.

DECLARE	@Sample TABLE
	(
		StartDT DATETIME,
		EndDT DATETIME
	)

INSERT	@Sample
SELECT '20080419 08:23', '20080419 12:30' UNION ALL
SELECT '20080419 08:53', '20080419 11:21' UNION ALL
SELECT '20080419 08:59', '20080419 18:22' UNION ALL
SELECT '20080419 09:21', '20080419 16:22' UNION ALL
SELECT '20080419 11:20', '20080419 12:20' UNION ALL
SELECT '20080419 11:33', '20080419 13:22' UNION ALL
SELECT '20080419 12:00', '20080419 16:11' UNION ALL
SELECT '20080419 12:31', '20080419 17:10'

-- Old way
SELECT		DATEPART(QUARTER, theTime) AS theQuarter,
		DATEPART(WEEKDAY, theTime) AS theWeekday,
		DATEPART(HOUR, theTime) AS theHour,
		COUNT(*) AS Cnt
FROM		(
			SELECT		DATEADD(HOUR, v.Number, s.StartDT) AS theTime
			FROM		(
						SELECT	DATEADD(HOUR, DATEDIFF(HOUR, 0, StartDT), 0) AS StartDT,
							DATEDIFF(HOUR, StartDT, EndDT) AS Hrs
						FROM	@Sample
					) AS s
			INNER JOIN	master..spt_values AS v ON v.Type = 'P'
			WHERE		v.Number <= s.Hrs
		) AS d
GROUP BY	DATEPART(QUARTER, theTime),
		DATEPART(WEEKDAY, theTime),
		DATEPART(HOUR, theTime)

E 12°55'05.63"
N 56°04'39.26"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-02-11 : 10:33:06

Here is another approach of which you might not have thought of.
Why have a static table grouped per a few fixed intervals such as quarter, weekday and hour?

Use highest precision available, always!

-- Prepare sample data
DECLARE	@Sample TABLE
	(
		StartDT DATETIME,
		EndDT DATETIME
	)

INSERT	@Sample
SELECT '20080419 08:23', '20080419 12:30' UNION ALL
SELECT '20080419 08:53', '20080419 11:21' UNION ALL
SELECT '20080419 08:59', '20080419 18:22' UNION ALL
SELECT '20080419 09:21', '20080419 16:22' UNION ALL
SELECT '20080419 11:20', '20080419 12:20' UNION ALL
SELECT '20080419 11:33', '20080419 13:22' UNION ALL
SELECT '20080419 12:00', '20080419 16:11' UNION ALL
SELECT '20080419 12:31', '20080419 17:10'

-- Prepare staging table
DECLARE	@Stage TABLE
	(
		RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
		FromTime DATETIME NOT NULL,
		ToTime DATETIME NOT NULL
	)
 
-- Populate staging table
INSERT		@Stage
		(
			FromTime,
			ToTime
		)
SELECT		u.theTime,
		u.theTime
FROM		@Sample AS s
UNPIVOT		(
			theTime
			FOR theCol IN (s.StartDT, s.EndDT)
		) AS u
GROUP BY	u.theTime
ORDER BY	u.theTime
 
-- Update with closest range
UPDATE		s
SET		s.ToTime = w.FromTime
FROM		@Stage AS s
INNER JOIN	@Stage AS w ON w.RecID = s.RecID + 1
 
-- Delete last time
DELETE
FROM	@Stage
WHERE	RecID = SCOPE_IDENTITY()
 
-- Display the result
SELECT		w.FromTime,
		w.ToTime,
		COUNT(*) AS Occurencies
FROM		@Sample AS s
INNER JOIN	@Stage AS w ON w.FromTime < s.EndDT
                        AND w.ToTime > s.StartDT
GROUP BY	w.FromTime,
		w.ToTime
HAVING		COUNT(*) > 1
ORDER BY	w.FromTime

With this resultset you can easily send in any datetime value and get back highest occurency.

SELECT	Occurencies
FROM	MyStatisticsTable
WHERE	@MyDate BETWEEN FromTime AND ToTime

E 12°55'05.63"
N 56°04'39.26"

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-02-12 : 06:07:27

Ok, now im feeling a bit humbled.
Peso you are right the sample data provided, it was not totally accurate. (ouch)
And I wont deny that your version (posted 02/11/2009 : 10:23:52) does do what i originally requested!

The problem I have (apart from this type of work being new and infrequent to me!) is that I don’t have anyone to bounce my ideas around with at work and if I had thought through I may have tried asking for the solution Lumbago provided.

Saying that I have looked at your post Peso (on 02/11/2009 10:33) while it is a different approach that will be more accurate and I can see the benefits of your logic.

I may look to add minutes to the Lumbago solution (though for 1 year that will give me 524160 rows of benchmark data! – which I will group; but may well slow my process overall…there is always much to consider)

Again, thank you both for your help – I have learnt lots again – and I know I will be able to provide a decent solution with you help.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-02-12 : 06:19:57

quote:
Originally posted by DLTaylor

The problem I have (apart from this type of work being new and infrequent to me!) is that I don’t have anyone to bounce my ideas around with at work and if I had thought through I may have tried asking for the solution Lumbago provided.

That's ok. Someone on SQLTeam will always have time to discuss your ideas.

quote:
Originally posted by DLTaylor

Saying that I have looked at your post Peso (on 02/11/2009 10:33) while it is a different approach that will be more accurate and I can see the benefits of your logic.

There are many benefits! But not so useful as Lumbago's and my first suggestion if you want to display the concurrency as a timeline. The real benefit is the speed and accuracy you get for querying the concurrency for any given time!

quote:
Originally posted by DLTaylor

I may look to add minutes to the Lumbago solution (though for 1 year that will give me 524160 rows of benchmark data! – which I will group; but may well slow my process overall…there is always much to consider)

There is no need to store zero values, really (unless you want to display data in a timelined fashion as in a graph component).
If you intend to make this a an stored procedure, any "non-hit" which results in a zero concurency can be handled by the stored procedure.

I guess it all breaks down to what you are going to with the statistics.

E 12°55'05.63"
N 56°04'39.26"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-02-12 : 06:26:09

Here is how you get all peak concurrencies in one select with my second suggestion.

SELECT		TOP 1 WITH TIES
		Occurencies
FROM		MyStatisticsTable
ORDER BY	Occurencies DESC

E 12°55'05.63"
N 56°04'39.26"

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources