Reader Challenge #1 Solutions (Part I)

By Bill Graziano on 28 May 2001 | 3 Comments | Tags: Reader Challenges


Thanks to the 18 people that submitted 27 possible solutions to our first Reader Challenge. In this article I've broken down the first correct solution which was submitted by Michael Price. I'll detail the cleanest, fastest and most obtuse solutions in the next article.

First I'd like to thank everyone for entering. I'm glad that 18 of you found this challenging enough to take the time to submit solutions. I'm also amazed at the variety of solutions that were submitted. This article details the first correct solution I received. I know there are other solutions that are faster and simpler but this solution was first, it was correct and I liked it. I think Michael may also have submitted the fastest solution but you'll have to wait until the next article.

The Solution

Michael wrote in his email:

Hello,

I figured You Are Kidding (YAK!) in trying to get the results in one select statement, but here is what I came up with.

I went for the Single Select/speed challenge part, it might also win the obtuse award <g>.

Nice yak reference but no bonus points since it's not in the solution. Michael actually submitted two solutions in the same email. And he submitted BOTH of them within 1 hour and 25 minutes from when I posted the challenge.

Here is Michael's solution:

Select Avg(Above) As Above, Count(*) As Below
  From Samples, (
                  Select Count(*) As Above 
                    From Samples
                   Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )
                 ) As AboveData
 Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples )

which returns this result set

Above       Below       
----------- ----------- 
7           4

(1 row(s) affected))

which is the correct result. I've modified Michael's code slightly to use the AS keyword to make column naming more explicit. It doesn't change the functionality but makes it easier to explain. I liked this solution because it uses a subquery, a derived table and a "psuedo-cartesion join" in a way I've never seen before. It also gives me three topics for future articles. I'm going to break this down starting at the inside SELECT and working my way out.



The Subquery

The deepest SELECT statement is

Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples 

If you hightlight this section of code and run it in Query Analyzer you get 582.485 back. This adds three times the standard deviation to the average and is the upper bound of the result set. Any value above this we want to count. Expanding out from this we get this piece of code

Select Count(*) As Above 
From Samples
Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )

Our first SELECT statement is now the subquery in the WHERE clause. Books Online says in Transact-SQL, a subquery can be substituted anywhere an expression can be used in SELECT, UPDATE, INSERT, and DELETE statements, except in an ORDER BY list. The result of this query is

Above       
----------- 
7

(1 row(s) affected)

which is half the answer.

The Derived Table

Adding a little more of the full query in and we get this statement

Select Avg(Above) As Above 
  From  (
                  Select Count(*) As Above 
                    From Samples
                   Where Ind_Ret > ( Select Avg(Ind_Ret) + (Stdev(Ind_Ret) * 3) From Samples )
                 ) As AboveData

Our SELECT statement from above is now a derived table. We are replacing a table name in a query with a SELECT statement that returns a "virtual" table. You can run this with either Avg(Above) As Above or Above in the outer SELECT statement. You can change the code and see that it runs either way and they return the same result set. I'll tell you why in a second.

The Other Half

The other half of the result comes from the remainder of the query. If I strip out the part we've already discussed it leaves us with

Select Count(*) As Below
  From Samples
 Where Ind_Ret < ( Select Avg(Ind_Ret) - (Stdev(Ind_Ret) * 3) From Samples )

and that returns this result set

Below       
----------- 
4

(1 row(s) affected)

That looks an awful lot like what I just covered so I won't go over it again.

The Cartesian Join

To get both values in the same result set, Michael just "stuck" the two tables (one actual table and one derived table) together in a cartesian join. This joins each row in the first table to each row in the second table. In this case there's only one row in each table so it works out perfectly. Since one field is an aggregate (COUNT), the other also has to be an aggregate (or he has to use a group by). So he takes the average of a single value (seven) which is the value itself. Pretty clever. In cases where I've needed to do this I've always had each field be a subquery (which another reader submitted).

Summary

This solution generates four table scans. That probably won't win any performance awards. For the record, his second solution (in the same email) reduced this to three table scans. And he reduced it to two tables scans the following morning. You'll have to wait for the next article to see if anyone did better.

Thanks for reading Michael and thanks for taking the challenge. If I had any prizes I'd gladly pass one your way. You'll have to settle for the fame and glory of submitting the first correct answer to the first SQLTeam.com Reader Challenge.

Discuss this article: 3 Comments so far. Print this Article. This page has been read 11,097 times.

If you like this article you can sign up for our newsletter. We send it out each week that we post a new article. There's an opt-out link at the bottom of each newsletter so it's easy to unsubscribe at any time.

Email Address:

Email ThisSubscribe to this feedKick itSave to del.icio.usView blog reactions

Related Articles

Another German Yak ... with a suprise (RC #3) (16 July 2002)

Das Yak ist Deutsch (RC #3) (14 July 2002)

Reader Challenge #3: Find the Yak! (22 March 2002)

Reader Challenge #2 Solutions (29 October 2001)

Reader Challenge #2 (CLOSED) (10 October 2001)

Reader Challenge #1 Solutions (Part II) (4 June 2001)

Reader Challenge #1 (16 May 2001)

Other Recent Forum Posts

SSIS Certificate Authentication : The request fail (1 Reply)

Categorizing Products in Orders (4 Replies)

Fragmentation question (0 Replies)

MD5 (15 Replies)

PL/SQL TO TSQL (0 Replies)

Datawarehouse Question (2 Replies)

summing, join not working (3 Replies)

the query is creating output with 8000 or 4000 cha (0 Replies)

Subscribe to SQLTeam.com

Weekly SQL Server newsletter with articles, forum posts, and blog posts via email:

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

- Sponsor's Message -

SQLShare.com Videos

Using DatePart and DateName

Ever want to get the month out of a date as a number, or as literal text? Many people will do it by parsing the date as a string, but we've got some built in functions that will do it cleanly and consistently.

Renaming a Database

You won't do it often, but it's nice to know how, and you're not still using sp_renamedb are you? Join us for a quick look at how to use the alter syntax to change the db name along with a tip on how to quickly disconnect any remaining users from the database.

File Share Subscriptions in Reporting Services

Whether you want to generate PDF invoices for customers or do a daily export that will get processed by one of your vendors, the ability to deliver reports to a file share is a useful and simple feature baked in to Reporting Services. In this lesson Devin will show you how to do it and how to set most of the common options.

File Share Subscriptions in Reporting Services

Whether you want to generate PDF invoices for customers or do a daily export that will get processed by one of your vendors, the ability to deliver reports to a file share is a useful and simple feature baked in to Reporting Services. In this lesson Devin will show you how to do it and how to set most of the common options.

Using DB_ID and DB_Name Functions

Simple but effective, DB_ID and DB_Name give you a concise way to look up the id of a database from a name, or look up the name of a database from an id. There are times when you'll need to write the join to sys.sysdatabases, but when all you need is a quick conversion, these functions get it done.