| Author |
Topic |
|
JohnL
Starting Member
3 Posts |
Posted - 2010-03-25 : 11:16:16
|
| Is there any way to calculate averages based on distinct values from another field.I have some OLAP style data in SQL-Server 2005 that looks something like this when de-normalised :-Patient ID Rx ID Product Rx Action Diagnosis Doctor Specialty Exit Dose---------- ----- ------- --------- --------- ---------------- ---------690529 1317203 acarbose Repeat Type II diabetes mellitus Diabetes NULL690529 1317203 acarbose Repeat Type II diabetes mellitus Endocrinology NULL703972 1347083 acarbose Repeat Type II diabetes mellitus NULL 100707196 1353727 acarbose Repeat Type II diabetes mellitus NULL 200714561 1370887 acarbose Repeat Diabetes Diabetes 100714561 1370887 acarbose Repeat Diabetes Elderly medicine 100714561 1370887 acarbose Repeat Diabetes Stroke 100714561 1370887 acarbose Repeat Obesity Diabetes 100714561 1370887 acarbose Repeat Obesity Elderly medicine 100714561 1370887 acarbose Repeat Obesity Stroke 100714561 1370887 acarbose Repeat Hypertension Diabetes 100714561 1370887 acarbose Repeat Hypertension Elderly medicine 100714561 1370887 acarbose Repeat Hypertension Stroke 100714561 1370887 acarbose Repeat Ischaemic heart disease Diabetes 100714561 1370887 acarbose Repeat Ischaemic heart disease Elderly medicine 100714561 1370887 acarbose Repeat Ischaemic heart disease Stroke 100714561 1370887 acarbose Repeat Stroke Diabetes 100714561 1370887 acarbose Repeat Stroke Elderly medicine 100714561 1370887 acarbose Repeat Stroke Stroke 100714561 1370887 acarbose Repeat Hyperlipidaemia Diabetes 100714561 1370887 acarbose Repeat Hyperlipidaemia Elderly medicine 100714561 1370887 acarbose Repeat Hyperlipidaemia Stroke 100My application is generating dynamic T-SQL to analyse this data with the required groupings and sub-totalse.g.select Product, 'Total Specialty' as Specialty, 'Total Diagnosis' as Diagnosis, count(distinct PatientID ) as [Patient Count], count( distinct RxID ) as [Rx Count], Avg( ExitDose ) as [Avg. Dose]from #Tempgroup by Productunion allselect Product, ' ' + Coalesce( DoctorSpecialty, '-' ), 'Total Diagnosis' as Diagnosis, count(distinct PatientID ), count( distinct RxID ), Avg( ExitDose ) from #Tempgroup by Product, ' ' + Coalesce( DoctorSpecialty, '-' )returnsProduct Specialty Diagnosis Patient Count Rx Count Avg. Dose------- --------- --------- ------------- -------- ---------acarbose Total Specialty Total Diagnosis 4 4 105acarbose - Total Diagnosis 2 2 150acarbose Diabetes Total Diagnosis 2 2 100acarbose Elderly medicine Total Diagnosis 1 1 100acarbose Endocrinology Total Diagnosis 1 1 NULLacarbose Stroke Total Diagnosis 1 1 100All looks good? Actually no.The patient count and Rx count figures are correct, but the average dose figure is wrong (especially at the total speciality level - where the value should be 133.33). This is because the last 18 records are not actually different values at all, but are actually for the same prescription that is just duplicated in the data for different diagnosis / doctor speciality values - which is skewing the average calculation.I would like to calculate the average by taking only one value for each distinct Rx ID value - is there any simple way to do this? |
|
|
DBA in the making
Aged Yak Warrior
638 Posts |
Posted - 2010-03-25 : 11:33:48
|
| Where did the 133.33 figure come from? In the query that generates the Total Specialty record, you're averaging the entire table. 20 Values, total of 2100. 2100/20 = 105.There are 10 types of people in the world, those that understand binary, and those that don't. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-03-25 : 11:37:43
|
| how do you populate #Temp? can you show its query?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
JohnL
Starting Member
3 Posts |
Posted - 2010-03-25 : 11:57:26
|
| The 133.33 figure comes from the fact that there are 4 distinct Rx ID values in the data1317203 - dose = null1347083 - dose = 1001353727 - dose = 2001370887 - dose = 100so what I need is the average of ( null, 100, 200, 100 ) = 400 / 3 = 133.33I know this won't be produced by the SQL statement I provided - I just need to know if there is any way to actually do this - calculate average value from all this data just taking 1 row for each distinct Rx ID value..The #Temp table is just to do proof of concept. The actual data comes from a view in the OLAP database. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-03-25 : 12:06:44
|
| ok then show actual query for view please------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
JohnL
Starting Member
3 Posts |
Posted - 2010-03-25 : 12:18:08
|
| The source of the data isn't important to solving my problem.Just assume that you have a single table with the following structurecreate table #Temp( PatientID int, RxID int, Product varchar(max), RxAction varchar(max), Diagnosis varchar(max), DoctorSpecialty varchar(max), ExitDose float) populated with the data at the top of this topic.I want to calculate an average of ExitDose values, but only take one record for each distinct RxID value. Is there any way to do this? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-03-25 : 12:20:47
|
| nope its important for getting distinct value. Would you please post it as requested?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-03-25 : 12:21:09
|
| or atleast suggest a unique valued column in your view?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
DBA in the making
Aged Yak Warrior
638 Posts |
Posted - 2010-03-25 : 12:24:46
|
You might be looking for something like this:select t1.Product, 'Total Specialty' as Specialty, 'Total Diagnosis' as Diagnosis, count(distinct t1.PatientID ) as [Patient Count], count( distinct t1.RxID ) as [Rx Count], Avg( t2.ExitDose ) as [Avg. Dose]from #Temp t1INNER JOIN ( SELECT x.Product, AVG(x.ExitDose ) as ExitDose FROM ( SELECT DISTINCT t.Product, t.RxID, CAST (t.ExitDose AS DECIMAL(8,2)) AS ExitDose FROM #temp t) x GROUP BY x.Product ) t2 ON t1.Product = t2.ProductGROUP BY t1.Productunion all... The CAST will prevent that trailing decimal places from being truncated, and ExitDose appears to be an integer.There are 10 types of people in the world, those that understand binary, and those that don't.EDIT: Forgot the final group by. |
 |
|
|
|