Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
DLTaylor
Posting Yak Master
136 Posts |
Posted - 2009-02-11 : 05:15:45
|
| I have a table containing activity with a start and end datetime for a 1 year period (see: [dbo].[Activity] for sample data)This contains 70,000 records over the 1 year.My aim is to use this data to work out trends in activity – which relates to the Quarter – Day of Week and Hour of day.e.g. On Winter, Mondays we are busiest between the hours of 9 and 12 and Very quiet on Summer, Sundays between 21 and 23 hours.I want to use the historical data to generate a ‘benchmark’ table – which can be used for comparisons.I envisage that the ‘benchmark’ table will end up with 672 rows – 4 (quarters) x 7 (Days of Week) x 24 Hours in the day.The Live Activity can then ‘reference’ (inner - join) to the benchmark table on the Live Quarter, Live Day of Week and Live Hour of Day – and provide me with a comparison to the typical Average.I assure this is not a homework question!!!! (though it possibly sounds like one!!) – I’m just not sure about the best way to go about it (without writing a load of Update Statements!!!)I have used this forum before and been amazed with the quality of replies – and I was hoping somebody would understand what I want …and have the Mathematical and SQL brain to work out how to do it.Any help will be really appreciatedSee code below;IF OBJECT_ID( N'Activity', N'U') IS NOT NULL drop table [dbo].[Activity]SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOCREATE TABLE [dbo].[Activity]( [Startdt] [datetime] NULL, [Enddt] [datetime] NULL) ON [PRIMARY]Insert into Activityselect '2008-04-19 08:23:00.000','2008-04-19 12:30:00.000' UNION allselect '2008-04-19 08:53:00.000','2008-04-19 11:21:00.000' UNION allselect '2008-04-19 08:59:00.000','2008-04-19 18:22:00.000' UNION allselect '2008-04-19 09:21:00.000','2008-04-19 16:22:00.000' UNION allselect '2008-04-19 11:20:00.000','2008-04-19 12:20:00.000' UNION allselect '2008-04-19 11:33:00.000','2008-04-19 13:22:00.000' UNION allselect '2008-04-19 12:00:00.000','2008-04-19 16:11:00.000' UNION allselect '2008-04-19 12:31:00.000','2008-04-19 17:10:00.000'select * from ActivityIF OBJECT_ID( N'ActivityTrend', N'U') IS NOT NULL drop table [dbo].[ActivityTrend]CREATE TABLE [dbo].[ActivityTrend]( [Qtr] [int] NULL, [DayOfWeek] [int] NULL, [Hour] [int] NULL, [count] [int] NULL) ON [PRIMARY]insert into [ActivityTrend]select '2','7','8','3' UNION allselect '2','7','9','4' UNION allselect '2','7','10','4' UNION allselect '2','7','11','5' UNION allselect '2','7','12','6' UNION allselect '2','7','13','3' UNION allselect '2','7','14','3' UNION allselect '2','7','15','4' UNION allselect '2','7','16','4' UNION allselect '2','7','17','2' UNION allselect '2','7','18','1' UNION allselect '2','7','19','0' --table type i would like to createselect * from ActivityTrendIF OBJECT_ID( N'ActivityTrend', N'U') IS NOT NULL drop table [dbo].[ActivityTrend]IF OBJECT_ID( N'Activity', N'U') IS NOT NULL drop table [dbo].[Activity] |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-02-11 : 07:45:56
|
So from what I understand you actually want to populate the ActivityTrend table with data aggregated from the Activity table right? I would alter the ActivityTrend table slightly but hopefully this is what you want:CREATE TABLE [dbo].[ActivityTrend]( [Date] datetime NOT NULL, [Year] [int] NOT NULL, [Quarter] [int] NOT NULL, [Month] [int] NOT NULL, [Week] [int] NOT NULL, [DayOfWeek] [int] NOT NULL, [Day] [int] NOT NULL, [Hour] [int] NOT NULL, [Count] [int] NULL) ON [PRIMARY]DECLARE @myDatetime datetimeSET @myDatetime = '2008-01-01 00:000:00'WHILE (@myDatetime < '2009-01-01 00:000:00') BEGIN INSERT INTO ActivityTrend ( [Date], [Year], [Quarter],[Month], [Week], [DayOfWeek],[Day],[Hour]) SELECT @myDatetime, DATEPART(yyyy, @myDatetime), DATEPART(q, @myDatetime), DATEPART(mm, @myDatetime), DATEPART(wk, @myDatetime), DATEPART(dw, @myDatetime), DATEPART(dd, @myDatetime), DATEPART(hh, @myDatetime) SET @myDatetime = DATEADD(hh, 1, @myDatetime) ENDUPDATE c SET c.[Count] = dt.[Count]FROM ActivityTrend c INNER JOIN ( SELECT b.[Date], [Count] = COUNT(*) FROM Activity a INNER JOIN ActivityTrend b ON b.[Date] BETWEEN a.StartDt AND a.EndDt GROUP BY b.[Date]) AS dt ON c.[Date] = dt.[Date]SELECT * FROM ActivityTrend WHERE [Count] IS NOT NULL - Lumbago |
 |
|
|
DLTaylor
Posting Yak Master
136 Posts |
Posted - 2009-02-11 : 08:28:38
|
| Amazing! - it works perfectly.You hit the nail on the head - this is what I wanted - only MORE powerful as I can generate better aggregates than my original table. Thank you for your help - I will ensure you get the credit in my final Code.Thanks again. |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-02-11 : 08:39:30
|
Great But did you understand the solution? It's actually extremely simple as long as you add all the datetimes prior to doing the actual count, and by doing this you will also get timelots when there are no activities which also is valuable information. And by adding Year to the trend table you can actually keep on collecting data and compare this years data to last, etc.- Lumbago |
 |
|
|
DLTaylor
Posting Yak Master
136 Posts |
Posted - 2009-02-11 : 09:44:12
|
| Absolutely, (after a little reading of course :-))Its a great solution - The addition of ALL datetimes makes it much more robust & the year allows for addition time periods (and i will be able to get better benchmark data :-) )The solution you have given will work for me on a number of datasets - and i can see it being used as a 'plug-in' for lots of activity with start and end times! I have already showed it to my coleagues who are really impressed with your work!Thanks |
 |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-02-11 : 09:54:50
|
| Really great that it works for you, and thanks for the kind words :)- Lumbago |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-02-11 : 10:20:25
|
I think your ActivityTrend is faulty, at least with provided sample data.Is this what you want?SELECT '20080419 08:23', '20080419 12:30' UNION ALL -- 8, 9, 10, 11, 12SELECT '20080419 08:53', '20080419 11:21' UNION ALL -- 8, 9, 10, 11SELECT '20080419 08:59', '20080419 18:22' UNION ALL -- 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18SELECT '20080419 09:21', '20080419 16:22' UNION ALL -- 9, 10, 11, 12, 13, 14, 15, 16SELECT '20080419 11:20', '20080419 12:20' UNION ALL -- 11, 12SELECT '20080419 11:33', '20080419 13:22' UNION ALL -- 11, 12, 13SELECT '20080419 12:00', '20080419 16:11' UNION ALL -- 12, 13, 14, 15, 16SELECT '20080419 12:31', '20080419 17:10' -- 12, 13, 14, 15, 16, 17 Hour Me You-- 8 3 3-- 9 4 4-- 10 4 4 -- 11 6 5 <- Difference of 1-- 12 7 6 <- Difference of 1-- 13 5 3 <- Difference of 2-- 14 4 3 <- Difference of 1-- 15 4 4-- 16 4 4-- 17 2 2-- 18 1 1 E 12°55'05.63"N 56°04'39.26" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-02-11 : 10:23:52
|
Try this set-based solution. No loops, no cursors.DECLARE @Sample TABLE ( StartDT DATETIME, EndDT DATETIME )INSERT @SampleSELECT '20080419 08:23', '20080419 12:30' UNION ALLSELECT '20080419 08:53', '20080419 11:21' UNION ALLSELECT '20080419 08:59', '20080419 18:22' UNION ALLSELECT '20080419 09:21', '20080419 16:22' UNION ALLSELECT '20080419 11:20', '20080419 12:20' UNION ALLSELECT '20080419 11:33', '20080419 13:22' UNION ALLSELECT '20080419 12:00', '20080419 16:11' UNION ALLSELECT '20080419 12:31', '20080419 17:10'-- Old waySELECT DATEPART(QUARTER, theTime) AS theQuarter, DATEPART(WEEKDAY, theTime) AS theWeekday, DATEPART(HOUR, theTime) AS theHour, COUNT(*) AS CntFROM ( SELECT DATEADD(HOUR, v.Number, s.StartDT) AS theTime FROM ( SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, StartDT), 0) AS StartDT, DATEDIFF(HOUR, StartDT, EndDT) AS Hrs FROM @Sample ) AS s INNER JOIN master..spt_values AS v ON v.Type = 'P' WHERE v.Number <= s.Hrs ) AS dGROUP BY DATEPART(QUARTER, theTime), DATEPART(WEEKDAY, theTime), DATEPART(HOUR, theTime) E 12°55'05.63"N 56°04'39.26" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-02-11 : 10:33:06
|
Here is another approach of which you might not have thought of.Why have a static table grouped per a few fixed intervals such as quarter, weekday and hour?Use highest precision available, always!-- Prepare sample dataDECLARE @Sample TABLE ( StartDT DATETIME, EndDT DATETIME )INSERT @SampleSELECT '20080419 08:23', '20080419 12:30' UNION ALLSELECT '20080419 08:53', '20080419 11:21' UNION ALLSELECT '20080419 08:59', '20080419 18:22' UNION ALLSELECT '20080419 09:21', '20080419 16:22' UNION ALLSELECT '20080419 11:20', '20080419 12:20' UNION ALLSELECT '20080419 11:33', '20080419 13:22' UNION ALLSELECT '20080419 12:00', '20080419 16:11' UNION ALLSELECT '20080419 12:31', '20080419 17:10'-- Prepare staging tableDECLARE @Stage TABLE ( RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, FromTime DATETIME NOT NULL, ToTime DATETIME NOT NULL ) -- Populate staging tableINSERT @Stage ( FromTime, ToTime )SELECT u.theTime, u.theTimeFROM @Sample AS sUNPIVOT ( theTime FOR theCol IN (s.StartDT, s.EndDT) ) AS uGROUP BY u.theTimeORDER BY u.theTime -- Update with closest rangeUPDATE sSET s.ToTime = w.FromTimeFROM @Stage AS sINNER JOIN @Stage AS w ON w.RecID = s.RecID + 1 -- Delete last timeDELETEFROM @StageWHERE RecID = SCOPE_IDENTITY() -- Display the resultSELECT w.FromTime, w.ToTime, COUNT(*) AS OccurenciesFROM @Sample AS sINNER JOIN @Stage AS w ON w.FromTime < s.EndDT AND w.ToTime > s.StartDTGROUP BY w.FromTime, w.ToTimeHAVING COUNT(*) > 1ORDER BY w.FromTime With this resultset you can easily send in any datetime value and get back highest occurency.SELECT OccurenciesFROM MyStatisticsTableWHERE @MyDate BETWEEN FromTime AND ToTime E 12°55'05.63"N 56°04'39.26" |
 |
|
|
DLTaylor
Posting Yak Master
136 Posts |
Posted - 2009-02-12 : 06:07:27
|
| Ok, now im feeling a bit humbled.Peso you are right the sample data provided, it was not totally accurate. (ouch)And I wont deny that your version (posted 02/11/2009 : 10:23:52) does do what i originally requested!The problem I have (apart from this type of work being new and infrequent to me!) is that I don’t have anyone to bounce my ideas around with at work and if I had thought through I may have tried asking for the solution Lumbago provided.Saying that I have looked at your post Peso (on 02/11/2009 10:33) while it is a different approach that will be more accurate and I can see the benefits of your logic.I may look to add minutes to the Lumbago solution (though for 1 year that will give me 524160 rows of benchmark data! – which I will group; but may well slow my process overall…there is always much to consider)Again, thank you both for your help – I have learnt lots again – and I know I will be able to provide a decent solution with you help. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-02-12 : 06:19:57
|
quote: Originally posted by DLTaylor The problem I have (apart from this type of work being new and infrequent to me!) is that I don’t have anyone to bounce my ideas around with at work and if I had thought through I may have tried asking for the solution Lumbago provided.
That's ok. Someone on SQLTeam will always have time to discuss your ideas.quote: Originally posted by DLTaylor Saying that I have looked at your post Peso (on 02/11/2009 10:33) while it is a different approach that will be more accurate and I can see the benefits of your logic.
There are many benefits! But not so useful as Lumbago's and my first suggestion if you want to display the concurrency as a timeline. The real benefit is the speed and accuracy you get for querying the concurrency for any given time!quote: Originally posted by DLTaylor I may look to add minutes to the Lumbago solution (though for 1 year that will give me 524160 rows of benchmark data! – which I will group; but may well slow my process overall…there is always much to consider)
There is no need to store zero values, really (unless you want to display data in a timelined fashion as in a graph component).If you intend to make this a an stored procedure, any "non-hit" which results in a zero concurency can be handled by the stored procedure.I guess it all breaks down to what you are going to with the statistics. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-02-12 : 06:26:09
|
Here is how you get all peak concurrencies in one select with my second suggestion.SELECT TOP 1 WITH TIES OccurenciesFROM MyStatisticsTableORDER BY Occurencies DESC E 12°55'05.63"N 56°04'39.26" |
 |
|
|
|
|
|
|
|