SQL Query asked by a recruiter
By Bill Graziano
on 7 November 2000
| 14 Comments
| Tags: Queries
Tom writes "During an interview, a recruiter asked me this question saying that it is probably the most difficult SQL query he can think of. I am tempted to say it is not possible to accomplish in a single query... maybe you know . . ." Read on to see what dastardly question the recruiter asked our intrepid student and see if the SQL gurus can answer it.
The recruiter asked "The database is a bug tracking database. There is really only one table that is relevent for this question. The name of the table is bugs. It has several columns: open_date, close_date, bugID, and severity.
A bug is considered open on a given day if it's open date is on or before that day, and it's close date is on or after that day.
So the goal is to generate a histogram to show the number of open bugs for a range of dates. The ideal result set would have two columns: date and open bug count on that date.
so the question is, can you do that in one query, and if so, what does that query look like?"
I've seen this one before but I've never been in a situation where I've had to actually answer it. This type of query is actually fairly common though so we'll give it a shot. I'll also give you some hints on what you could have told to recruiter to put them in their place.
SELECT statements are designed to return data out of tables. The only way that I know of to make this work perfectly is to have a table with every possible date in it. You can join this to your bug table and make it work quite easily. This is the only way I know of using a SINGLE SELECT statement to make this work the way they want it. Tell the recruiter they need a better database design to meet the requirements of the project.
The second easiest way is to build a temporary table with all the dates you need in it (every date between the earliest open date and the latest close date) and join that back to your table. That doesn't accomplish it in a single SELECT statement but it does accomplish it in a single stored procedure. If they don't like this, ask them why you can't use the proper tools required to get the job done.
The third easiest way is to go buy some real bug tracking software. Ask them why they don't provide the proper tools for their development teams.
The fourth easiest way and a real kludge goes something like this:
SELECT DISTINCT BugDate = OpenDate,
BugCount = (SELECT Count(*)
FROM Bugs X
WHERE X.OpenDate <= B1.OpenDate
AND CloseDate >= B1.OpenDate )
FROM Bugs B1
SELECT DISTINCT BugDate = CloseDate,
BugCount = (SELECT Count(*)
FROM Bugs X
WHERE X.OpenDate <= B2.CloseDate
AND CloseDate >= B2.CloseDate )
FROM Bugs B2
ORDER BY 1
This query uses the table of open and close dates to generate the base dates for the query. Then it uses subqueries to get the count of open bugs for each date. You can see how the table of all possible dates would come in real handy here.
A big flaw is the holes in the result set. If no bug was opened or closed on a date then that date won't appear in the table. The other small flaw it has is that all bugs must be closed in the table since there is no provision for NULLs in the CloseDate field. I think this is the best you can do with one SQL statement and their one provided table.
We'd have to ask Tom how well he remembered the recruiters question. The recruiter may not have asked for all the dates though I suspect they wanted them. They also didn't specify how unclosed bugs are to be handled. It's always good to be precise in these things.
Did you ask to see their solution? I'd be interested if you could post it here. And thanks for the really cool question. I stayed up far too late tonight answering it.
Updated: (3/27/2001) Amazingly enough, an astute reader emailed us a solution to this! And it's a good one!