Return to Detecting "Runs" or "Streaks" in your data
Detecting "Runs" or "Streaks" in your data
Written by Jeff Smith on 16 February 2003
This article comes from Jeff Smith. Jeff writes "It can be difficult in a set-based language such as SQL to determine when a number of records in a row (when ordered by a unique key) have the same values in a particular field. For example, you may have a table of GameResults, with fields such as GameDate (primary key) and result (W for win, L for loss), and you may wish to find out what kinds of winning or losing streaks exist in your data."
Here's our sample table:
Looking at the above table, it is easy to get what our "Win-Loss" record is for a period of time:
SELECT SUM(CASE WHEN Result = 'W' THEN 1 ELSE 0 END) as Wins,
SUM(CASE WHEN Result = 'L' THEN 1 ELSE 0 END) as Losses FROM GameResults
But how we do determine what was the longest winning streak, or losing streak, in that time period? Or, as of the last date, what the current streak is?
The trick is to see what each run of W's or L's, when ordered by GameDate
(ordering is very important, of course), have in common. The answer may
not be obvious, but consider the following:
Each "run" of Results has the same number of results BEFORE it that are NOT equal to that result.
Let's look at all the W's to see how they can be broken into runs. Note that the first W, on 1/1/2000, has no results before it of any kind -- so it's "RunGroup" would be 0. The W on 1/15/2000 has 1 non-W result before it, as does the W's on 1/17 and 1/22. They all get a "RunGroup" of 1. The W on 2/5/2000 has 2 non-W results before it, so it gets a RunGroup of 2.
We can calculate our RunGroup field many ways, here's one way to do it that is pretty easy to understand (let's call the following code "SQL 1"):
FROM GameResults G
WHERE G.Result <> GR.Result
AND G.GameDate <= GR.GameDate) as RunGroup
FROM GameResults GR
Looking at the results from the above query, we get:
GameDate Result RunGroup
1/1/2000 W 0
1/12/2000 L 1
1/15/2000 W 1
1/17/2000 W 1
1/22/2000 W 1
2/1/2000 L 4
2/5/2000 W 2
2/8/2000 L 5
2/16/2000 W 3
2/19/2000 L 6
2/25/2000 L 6
2/28/2000 L 6
3/15/2000 L 6
3/19/2000 W 7
3/25/2000 W 7
Next, we can determine how many games in each run, and the start and end game, with a simple GROUP BY of the above results (call this "SQL 2") :
MIN(GameDate) as StartDate,
MAX(GameDate) as EndDate,
COUNT(*) as Games
FROM (SQL 1) A
GROUP BY Result, RunGroup
ORDER BY Min(GameDate)
Note in the above that we are grouping by RunGroup, but not selecting RunGroup as a field in our resultset. The RunGroup field is really of no use to us to return, it's just used by the GROUP BY clause to separate each run into it's own group. (ed: Here's an article on derived tables)
Result StartDate EndDate Games
W 1/1/2000 1/1/2000 1
L 1/12/2000 1/12/2000 1
W 1/15/2000 1/22/2000 3
L 2/1/2000 2/1/2000 1
W 2/5/2000 2/5/2000 1
L 2/8/2000 2/8/2000 1
W 2/16/2000 2/16/2000 1
L 2/19/2000 3/15/2000 4
W 3/19/2000 3/25/2000 2
Notice how the W's and L's are all grouped together, and we now have the length and start and end date of each run or streak.
Finally, we can now get some more answers:
What was the longest winning streak of the year?
SELECT TOP 1 *
FROM (SQL 2) A
ORDER BY Games DESC
Where Result = 'W'
What is the current streak as of the last game? (Very common in standings in the sports section)
FROM (SQL2 ) A
WHERE EndDate = (SELECT Max(EndDate)
How many streaks, of 3 games or more, did we have?
COUNT(*) as NumberOfStreaks
FROM (SQL 2) A
GROUP BY Result
WHERE Games >= 3
Doing this for multiple teams or divisions or whatever you need just requires more fields to be included as part of your GROUP BY expression. That is, to track this for multiple teams, you wouldn't GROUP BY just "Result" in all of the queries, but rather by "TeamID, Result".