Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Box and Whisker Plot

Author  Topic 

stwp86
Starting Member

42 Posts

Posted - 2012-09-26 : 16:49:09
Hey Everyone . . .

I really hate doing this , but I lack the necessary skills to write this from scratch and have not been able to find query help regarding this topic.

I have had great success posting here with this great community of experts, so I thought I would give it another go . . .

I will try to lay everything out that I have been able to gather to this point.

I have the following table/columns(name: workflow_data):
Table Columns:
Workflow_ID
Duration
Start_Time
End_Time
User_Name
Server
Client
Process_Start
Process_End
Process_Checkpoints
Definition
Client
Group_Type

I will be using a reporting tool to create the Box and Whisker plot (5 values are required to make this work)
Required Values:
Median
Max
Min
Lower Quartile
Upper Quartile

This information came from here: [url]http://www.bidn.com/blogs/mikedavis/ssis/163/how-to-make-a-box-plot-chart-in-sql-reporting-services-2008-ssrs[/url]

The bases for these calculations would be the duration column from the SQL table. I would ideally like to restrict based on the definition, client, and/or group_type fields and group by work week (exclude weekends) and display as the monday of the week that the data falls in.

In the past I have handled the work week grouping like this: CONVERT(varchar(50), (DATEADD(d, @@DATEFIRST - DATEPART(dw, Start_Time) - 5, Start_Time)), 101)

And I would also exclude the weekends like this in the where clause: datepart(dw, workflow_data.Start_Time) not in (1,7)

I feel bad asking for help in this manner, but am stuck trying to get my team by while we wait for a SME to join us.

Thanks in advance

Travis

stwp86
Starting Member

42 Posts

Posted - 2012-09-27 : 11:23:24
I was able to locate a great post on this topic and I have gotten to this point:

WITH SourceData AS (

--Table of sets over which we want to produce multiple quartiles.
SELECT sales.duration AS Measure,
-- sales.definition AS GroupID,
CONVERT(varchar(50), (DATEADD(d, @@DATEFIRST - DATEPART(dw, sales.Start_Time) - 5, sales.Start_Time)), 101) As GroupID,
sales.duration as MaxMeasure,
sales.duration as MinMeasure
FROM Workflow_Data AS sales
WHERE sales.client_mnemonic = 'XXXX_XX' and
sales.Definition in
(
'Meds Rec - Discharge',
'Meds Rec - Transfer',
'Meds Rec - Admission'
) and datepart(dw, sales.Start_Time) not in (1,7)

)
--Aggregate into a single record for each group, using MAX to select the non-null
--detail value for each column.
SELECT GroupID,
(Max(Q1NextVal) - MAX(Q1Val)) * Max(Q1Frac) + Max(Q1Val) Quartile1,
(Max(MidVal1) + Max(MidVal2)) / 2 Median,
(Max(Q3NextVal) - MAX(Q3Val)) * Max(Q3Frac) + Max(Q3Val) Quartile3

--Save into a separate table if desired.
--INTO dbo.PreCalculatedQuartiles
FROM (
--Expose the detail values for only the records at the index values
--generated by the summary subquery. All other values are left as NULL.
SELECT detail.GroupID,
CASE WHEN RowNum = Q1Idx THEN Measure ELSE NULL END Q1Val,
CASE WHEN RowNum = Q1Idx + 1 THEN Measure ELSE NULL END Q1NextVal,
CASE WHEN RowNum = Q3Idx THEN Measure ELSE NULL END Q3Val,
CASE WHEN RowNum = Q3Idx + 1 THEN Measure ELSE NULL END Q3NextVal,
Q1Frac,
Q3Frac,
CASE WHEN RowNum = MidPt1 THEN Measure ELSE NULL END MidVal1,
CASE WHEN RowNum = MidPt2 THEN Measure ELSE NULL END MidVal2
FROM
--Calculate a row number sorted by measure for each group.
(SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY Measure) RowNum
FROM SourceData) AS detail

INNER JOIN (
--Summarize to find index numbers and fractions we need to use to locate
--the values at the quartile points.
SELECT GroupID,
FLOOR((COUNT(*) + 3) / 4.0) Q1Idx,
((COUNT(*) + 3) / 4.0) - FLOOR((COUNT(*) + 3) / 4.0) Q1Frac,
(COUNT(*) + 1) / 2 AS MidPt1,
(COUNT(*) + 2) / 2 AS Midpt2,
FLOOR((COUNT(*) * 3 + 1) / 4.0) Q3Idx,
((COUNT(*) * 3 + 1) / 4.0) - FLOOR((COUNT(*) * 3 + 1) / 4.0) Q3Frac
FROM SourceData
GROUP BY GroupID
HAVING COUNT(*) > 1

) AS summary ON detail.GroupID = summary.GroupID

) AS combined
GROUP BY GroupID;



I am just missing the Min and Max duration, any thoughts on how I would add that in to the above query?
thanks!

Travis
Go to Top of Page

stwp86
Starting Member

42 Posts

Posted - 2012-09-27 : 11:39:47
Figured it out:

Here is the finished product incase anyone else needs something similar:

WITH SourceData AS (

--Table of sets over which we want to produce multiple quartiles.
SELECT sales.duration AS Measure,
CONVERT(varchar(50), (DATEADD(d, @@DATEFIRST - DATEPART(dw, sales.Start_Time) - 5, sales.Start_Time)), 101) As GroupID
FROM Workflow_Data AS sales
WHERE sales.client_mnemonic = 'XXXX_XX'and
sales.Definition in
(
'Meds Rec - Discharge',
'Meds Rec - Transfer',
'Meds Rec - Admission'
) and datepart(dw, sales.Start_Time) not in (1,7)

)
--Aggregate into a single record for each group, using MAX to select the non-null
--detail value for each column.
SELECT GroupID,
(Max(Q1NextVal) - MAX(Q1Val)) * Max(Q1Frac) + Max(Q1Val) Quartile1,
(Max(MidVal1) + Max(MidVal2)) / 2 Median,
(Max(Q3NextVal) - MAX(Q3Val)) * Max(Q3Frac) + Max(Q3Val) Quartile3,
Max(measure) maxMeasure,
Min(measure) minMeasure

--Save into a separate table if desired.
--INTO dbo.PreCalculatedQuartiles
FROM (
--Expose the detail values for only the records at the index values
--generated by the summary subquery. All other values are left as NULL.
SELECT detail.GroupID,
CASE WHEN RowNum = Q1Idx THEN Measure ELSE NULL END Q1Val,
CASE WHEN RowNum = Q1Idx + 1 THEN Measure ELSE NULL END Q1NextVal,
CASE WHEN RowNum = Q3Idx THEN Measure ELSE NULL END Q3Val,
CASE WHEN RowNum = Q3Idx + 1 THEN Measure ELSE NULL END Q3NextVal,
Q1Frac,
Q3Frac,
CASE WHEN RowNum = MidPt1 THEN Measure ELSE NULL END MidVal1,
CASE WHEN RowNum = MidPt2 THEN Measure ELSE NULL END MidVal2,
Detail.measure
FROM
--Calculate a row number sorted by measure for each group.
(SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY Measure) RowNum
FROM SourceData) AS detail

INNER JOIN (
--Summarize to find index numbers and fractions we need to use to locate
--the values at the quartile points.
SELECT GroupID,
FLOOR((COUNT(*) + 3) / 4.0) Q1Idx,
((COUNT(*) + 3) / 4.0) - FLOOR((COUNT(*) + 3) / 4.0) Q1Frac,
(COUNT(*) + 1) / 2 AS MidPt1,
(COUNT(*) + 2) / 2 AS Midpt2,
FLOOR((COUNT(*) * 3 + 1) / 4.0) Q3Idx,
((COUNT(*) * 3 + 1) / 4.0) - FLOOR((COUNT(*) * 3 + 1) / 4.0) Q3Frac
FROM SourceData
GROUP BY GroupID
HAVING COUNT(*) > 1

) AS summary ON detail.GroupID = summary.GroupID

) AS combined
GROUP BY GroupID;
Go to Top of Page

bitsmed
Aged Yak Warrior

545 Posts

Posted - 2012-09-27 : 12:22:06
Thanks you for sharing.
Go to Top of Page
   

- Advertisement -