Return to Reader Challenge #1 Solutions (Part I) ## Reader Challenge #1 Solutions (Part I)Written by Thanks to the 18 people that submitted 27 possible solutions to our first First I'd like to thank everyone for entering. I'm glad that 18 of you found this challenging enough to take the time to submit solutions. I'm also amazed at the variety of solutions that were submitted. This article details the first correct solution I received. I know there are other solutions that are faster and simpler but this solution was first, it was correct and I liked it. I think Michael may also have submitted the fastest solution but you'll have to wait until the next article. ## The SolutionMichael wrote in his email:
Nice yak reference but no bonus points since it's not in the solution. Michael actually submitted two solutions in the same email. And he submitted BOTH of them within 1 hour and 25 minutes from when I posted the challenge. Here is Michael's solution: Select Avg(Above) As Above, Count(*) As Below From Samples, ( Select Count(*) As Above From Samples Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples ) ) As AboveData Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples ) which returns this result set Above Below ----------- ----------- 7 4 (1 row(s) affected)) which is the correct result. I've modified Michael's code slightly to use the AS keyword to make column naming more explicit. It doesn't change the functionality but makes it easier to explain. I liked this solution because it uses a subquery, a derived table and a "psuedo-cartesion join" in a way I've never seen before. It also gives me three topics for future articles. I'm going to break this down starting at the inside SELECT and working my way out. ## The SubqueryThe deepest SELECT statement is Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples If you hightlight this section of code and run it in Query Analyzer you get 582.485 back. This adds three times the standard deviation to the average and is the upper bound of the result set. Any value above this we want to count. Expanding out from this we get this piece of code Select Count(*) As Above From Samples Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples ) Our first SELECT statement is now the subquery in the WHERE clause. Books Online says Above ----------- 7 (1 row(s) affected) which is half the answer. ## The Derived TableAdding a little more of the full query in and we get this statement Select Avg(Above) As Above From ( Select Count(*) As Above From Samples Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples ) ) As AboveData Our SELECT statement from above is now a derived table. We are replacing a table name in a query with a SELECT statement that returns a "virtual" table. You can run this with either Avg(Above) As Above or Above in the outer SELECT statement. You can change the code and see that it runs either way and they return the same result set. I'll tell you why in a second. ## The Other HalfThe other half of the result comes from the remainder of the query. If I strip out the part we've already discussed it leaves us with Select Count(*) As Below From Samples Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples ) and that returns this result set Below ----------- 4 (1 row(s) affected) That looks an awful lot like what I just covered so I won't go over it again. ## The Cartesian JoinTo get both values in the same result set, Michael just "stuck" the two tables (one actual table and one derived table) together in a cartesian join. This joins each row in the first table to each row in the second table. In this case there's only one row in each table so it works out perfectly. Since one field is an aggregate (COUNT), the other also has to be an aggregate (or he has to use a group by). So he takes the average of a single value (seven) which is the value itself. Pretty clever. In cases where I've needed to do this I've always had each field be a subquery (which another reader submitted). ## SummaryThis solution generates four table scans. That probably won't win any performance awards. For the record, his second solution (in the same email) reduced this to three table scans. And he reduced it to two tables scans the following morning. You'll have to wait for the next article to see if anyone did better. Thanks for reading Michael and thanks for taking the challenge. If I had any prizes I'd gladly pass one your way. You'll have to settle for the fame and glory of submitting the first correct answer to the first SQLTeam.com Reader Challenge. |