Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Script Library
 Better median??

Author  Topic 

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-07 : 14:58:00
I recently had to use my own little median technique again on a report here at work, and had posted it before, but wasn't sure if anyone had seen it. I have read Celko's and others techniques for generating a median and haven't seen one more efficent.

Does anyone have a better way they can think of? I think this bad boy is pretty short & efficient.

First, if you want to return the middle number or the higher one next to the middle if there is an even number:

SELECT x.Value AS median
FROM Vals x
CROSS JOIN Vals y
GROUP BY x.Value
HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0)

Change the " IN (1,0)" to "IN (-1,0)" to get the lower value if there is an even # of values.

Basically, we are saying compare each number to all possible numbers, and add up values of 1,0 or -1 depending if the first number is less, equal or higher than the second. The number that returns 0 is right in the middle ... If there is no middle, a -1 or 1 is returned. There will never be a 0 and (-1 or 1) at the same time returned.

To get the financial median (avg of the 2 values middle values if there is an even number), you need to encapsulate the results of the above into a subquery, allow for not just (-1,0) but all three (-1,0,1) and then take the AVG of the values returned.

That is,

SELECT Avg(Median) as Median FROM
(
SELECT x.Value AS median
FROM Vals x
CROSS JOIN Vals y
GROUP BY x.Value
HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0,-1)
) A

If there is an even number of values, the lower and higher middle ones are averaged. If there is an odd number, only the middle value is returned and averaged (which of course has no effect).

Most other techniques used several COUNT(*) subqueries which this one avoids.

Critique and enjoy!

- Jeff

SamC
White Water Yakist

3467 Posts

Posted - 2002-12-08 : 09:20:43
Hi Jeff,

Way cool. I can see other applications to find the Nth item in a list (SUM(SIGN) = N) , or every other item (SUM(SIGN) EVEN OR ODD).

I don't understand the requirement (or application) for an average that's composed of the average of the two middle values. Why not the more common average of all values? Maybe I misunderstood the result? If you have a good reason, I'd like to hear it.

Sam

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-08 : 09:30:27
That's the definition of the financial median. For an even number of rows, the financial median is the average of the two middle values.

The statistical median is either middle value, depending on which one you decide to choose, but it has to be an actual value in the set.

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-08 : 15:07:11
Yep, those are the reasons!

We always use the financial median (avg of two middle ones) here at work.

Oh, and I almost forgot, the nicest thing about my technique is that it's very easy to get a median for groups of records:

SELECT Company, Avg(Median) as Median FROM
(
SELECT x.Company, x.Value AS median
FROM Vals x
INNER JOIN Vals y
ON x.Company = y.Company
GROUP BY x.Company, x.Value
HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0,-1)
) A

The other techniques make doing this somewhat complicated.


- Jeff

Edited by - jsmith8858 on 12/08/2002 15:07:59
Go to Top of Page

Andraax
Aged Yak Warrior

790 Posts

Posted - 2002-12-08 : 17:00:21
Nice one Jeff! Very useful!

Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2004-03-05 : 17:24:58
Surely a little bit late to reply, but here it goes.
There are different definitions of the words 'financial' and 'statistical' median. While one folks argue that in both cases it is the mean of the two middle values, some other folks say that it has to be a concrete value not an abstract. Those folks take always the LOWER of the both middle value. However, which way you go isn't important at all. With a little help from the proprietary TOP clause both can be very easily calculated.

Here is the version that takes the mean of both middle values:

IF OBJECT_ID('median') IS NOT NULL
DROP TABLE median
GO
CREATE TABLE median(
col1 INT
)
GO
INSERT INTO median (col1) VALUES (1)
INSERT INTO median (col1) VALUES (2)
INSERT INTO median (col1) VALUES (3)
INSERT INTO median (col1) VALUES (3)
INSERT INTO median (col1) VALUES (4)
INSERT INTO median (col1) VALUES (7)
INSERT INTO median (col1) VALUES (8)
INSERT INTO median (col1) VALUES (9)


SELECT
((SELECT MIN(Col1)
FROM
(SELECT TOP 50 PERCENT Col1
FROM Median
ORDER BY Col1 DESC) a)+
(SELECT MAX(Col1)
FROM
(SELECT TOP 50 PERCENT Col1
FROM Median
ORDER BY Col1) a))/2. AS Median

Median
------------------
3.500000

(1 row(s) affected)


And here's the version where the lower of both middle values is taken:

SELECT MAX(Col1) AS Median
FROM
(SELECT TOP 50 PERCENT Col1
FROM Median
ORDER BY Col1) a
Median
-----------
3

(1 row(s) affected)

While the median as a measure of central tendency is not so teribbly useful at all, this code should also work with quartiles or whatever fancy there is. Both methods work with even and odd numbers of rows.


--Frank
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-03-10 : 14:06:19
Wow, just saw this...sure I'll run in to that now that we're converting everything to OFA...

It's going to be painful...

Thanks for the code Jeff...

Book mark it!



Brett

8-)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-19 : 16:05:48
For SQL Server 2005, use this
asc desc diff 
1 4 3 -- even number of datapoints
2 3 1
3 2 -1
4 1 -3

1 3 2 -- odd number of datapoints
2 2 0
3 1 -2

select avg(val) as median from
(select row_number() over (order by val asc) as sort_asc
, row_number() over (order by val desc) as sort_desc
, val
from table1) as t
where sort_asc - sort_desc between -1 and 1


Peter Larsson
Helsingborg, Sweden
Go to Top of Page
   

- Advertisement -