Reader Challenge #1 Solutions (Part I)
By Bill Graziano
on 28 May 2001
| 3 Comments
| Tags: Reader Challenges
Thanks to the 18 people that submitted 27 possible solutions to our first Reader Challenge. In this article I've broken down the first correct solution which was submitted by Michael Price. I'll detail the cleanest, fastest and most obtuse solutions in the next article.
First I'd like to thank everyone for entering. I'm glad that 18 of you found this challenging enough to take the time to submit solutions. I'm also amazed at the variety of solutions that were submitted. This article details the first correct solution I received. I know there are other solutions that are faster and simpler but this solution was first, it was correct and I liked it. I think Michael may also have submitted the fastest solution but you'll have to wait until the next article.
The Solution
Michael wrote in his email:
Hello,
I figured You Are Kidding (YAK!) in trying to get the results in one select statement, but here is what I came up with.
I went for the Single Select/speed challenge part, it might also win the obtuse award <g>.
Nice yak reference but no bonus points since it's not in the solution. Michael actually submitted two solutions in the same email. And he submitted BOTH of them within 1 hour and 25 minutes from when I posted the challenge.
Here is Michael's solution:
Select Avg(Above) As Above, Count(*) As Below
From Samples, (
Select Count(*) As Above
From Samples
Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )
) As AboveData
Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples )
which returns this result set
Above Below
----------- -----------
7 4
(1 row(s) affected))
which is the correct result. I've modified Michael's code slightly to use the AS keyword to make column naming more explicit. It doesn't change the functionality but makes it easier to explain. I liked this solution because it uses a subquery, a derived table and a "psuedo-cartesion join" in a way I've never seen before. It also gives me three topics for future articles. I'm going to break this down starting at the inside SELECT and working my way out.
The Subquery
The deepest SELECT statement is
Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples
If you hightlight this section of code and run it in Query Analyzer you get 582.485 back. This adds three times the standard deviation to the average and is the upper bound of the result set. Any value above this we want to count. Expanding out from this we get this piece of code
Select Count(*) As Above
From Samples
Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )
Our first SELECT statement is now the subquery in the WHERE clause. Books Online says in Transact-SQL, a subquery can be substituted anywhere an expression can be used in SELECT, UPDATE, INSERT, and DELETE statements, except in an ORDER BY list. The result of this query is
Above
-----------
7
(1 row(s) affected)
which is half the answer.
The Derived Table
Adding a little more of the full query in and we get this statement
Select Avg(Above) As Above
From (
Select Count(*) As Above
From Samples
Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )
) As AboveData
Our SELECT statement from above is now a derived table. We are replacing a table name in a query with a SELECT statement that returns a "virtual" table. You can run this with either Avg(Above) As Above or Above in the outer SELECT statement. You can change the code and see that it runs either way and they return the same result set. I'll tell you why in a second.
The Other Half
The other half of the result comes from the remainder of the query. If I strip out the part we've already discussed it leaves us with
Select Count(*) As Below
From Samples
Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples )
and that returns this result set
Below
-----------
4
(1 row(s) affected)
That looks an awful lot like what I just covered so I won't go over it again.
The Cartesian Join
To get both values in the same result set, Michael just "stuck" the two tables (one actual table and one derived table) together in a cartesian join. This joins each row in the first table to each row in the second table. In this case there's only one row in each table so it works out perfectly. Since one field is an aggregate (COUNT), the other also has to be an aggregate (or he has to use a group by). So he takes the average of a single value (seven) which is the value itself. Pretty clever. In cases where I've needed to do this I've always had each field be a subquery (which another reader submitted).
Summary
This solution generates four table scans. That probably won't win any performance awards. For the record, his second solution (in the same email) reduced this to three table scans. And he reduced it to two tables scans the following morning. You'll have to wait for the next article to see if anyone did better.
Thanks for reading Michael and thanks for taking the challenge. If I had any prizes I'd gladly pass one your way. You'll have to settle for the fame and glory of submitting the first correct answer to the first SQLTeam.com Reader Challenge.