Detecting "Runs" or "Streaks" in your data

By Jeff Smith on 16 February 2003 | Tags: SELECT

This article comes from Jeff Smith. Jeff writes "It can be difficult in a set-based language such as SQL to determine when a number of records in a row (when ordered by a unique key) have the same values in a particular field. For example, you may have a table of GameResults, with fields such as GameDate (primary key) and result (W for win, L for loss), and you may wish to find out what kinds of winning or losing streaks exist in your data."

Here's our sample table:
GameDate    Result
1/1/2000    W
1/12/2000   L
1/15/2000   W
1/17/2000   W
1/22/2000   W
2/1/2000    L
2/5/2000    W
2/8/2000    L
2/16/2000   W
2/19/2000   L
2/25/2000   L
2/28/2000   L
3/15/2000   L
3/19/2000   W
3/25/2000   W

Looking at the above table, it is easy to get what our "Win-Loss" record is for a period of time:

      SUM(CASE WHEN Result = 'L' THEN 1 ELSE 0 END) as Losses FROM GameResults

But how we do determine what was the longest winning streak, or losing streak, in that time period? Or, as of the last date, what the current streak is?

The trick is to see what each run of W's or L's, when ordered by GameDate (ordering is very important, of course), have in common. The answer may not be obvious, but consider the following:

Each "run" of Results has the same number of results BEFORE it that are NOT equal to that result.

Let's look at all the W's to see how they can be broken into runs. Note that the first W, on 1/1/2000, has no results before it of any kind -- so it's "RunGroup" would be 0. The W on 1/15/2000 has 1 non-W result before it, as does the W's on 1/17 and 1/22. They all get a "RunGroup" of 1. The W on 2/5/2000 has 2 non-W results before it, so it gets a RunGroup of 2.

We can calculate our RunGroup field many ways, here's one way to do it that is pretty easy to understand (let's call the following code "SQL 1"):

SELECT GameDate, 
   FROM GameResults G 
   WHERE G.Result <> GR.Result 
   AND G.GameDate <= GR.GameDate) as RunGroup 
FROM GameResults GR

Looking at the results from the above query, we get:

GameDate    Result    RunGroup
1/1/2000    W         0
1/12/2000   L         1
1/15/2000   W         1
1/17/2000   W         1
1/22/2000   W         1
2/1/2000    L         4
2/5/2000    W         2
2/8/2000    L         5
2/16/2000   W         3
2/19/2000   L         6
2/25/2000   L         6
2/28/2000   L         6
3/15/2000   L         6
3/19/2000   W         7
3/25/2000   W         7

Next, we can determine how many games in each run, and the start and end game, with a simple GROUP BY of the above results (call this "SQL 2") :

SELECT Result, 
  MIN(GameDate) as StartDate, 
  MAX(GameDate) as EndDate, 
  COUNT(*) as Games
GROUP BY Result, RunGroup
ORDER BY Min(GameDate)

Note in the above that we are grouping by RunGroup, but not selecting RunGroup as a field in our resultset. The RunGroup field is really of no use to us to return, it's just used by the GROUP BY clause to separate each run into it's own group. (ed: Here's an article on derived tables)

That returns:

Result      StartDate   EndDate     Games
W           1/1/2000    1/1/2000    1
L           1/12/2000   1/12/2000   1
W           1/15/2000   1/22/2000   3
L           2/1/2000    2/1/2000    1
W           2/5/2000    2/5/2000    1
L           2/8/2000    2/8/2000    1
W           2/16/2000   2/16/2000   1
L           2/19/2000   3/15/2000   4
W           3/19/2000   3/25/2000   2

Notice how the W's and L's are all grouped together, and we now have the length and start and end date of each run or streak.

Finally, we can now get some more answers:

What was the longest winning streak of the year?

Where Result = 'W'

What is the current streak as of the last game? (Very common in standings in the sports section)

WHERE EndDate = (SELECT Max(EndDate) 
FROM GameResults)

How many streaks, of 3 games or more, did we have?

SELECT Result, 
  COUNT(*) as NumberOfStreaks 
GROUP BY Result 
WHERE Games >= 3

Doing this for multiple teams or divisions or whatever you need just requires more fields to be included as part of your GROUP BY expression. That is, to track this for multiple teams, you wouldn't GROUP BY just "Result" in all of the queries, but rather by "TeamID, Result".

Related Articles

Joining to the Next Sequential Row (2 April 2008)

Writing Outer Joins in T-SQL (11 February 2008)

How to Use GROUP BY with Distinct Aggregates and Derived tables (31 July 2007)

How to Use GROUP BY in SQL Server (30 July 2007)

SQL Server 2005: Using OVER() with Aggregate Functions (21 May 2007)

Server Side Paging using SQL Server 2005 (4 January 2007)

Using XQuery, New Large DataTypes, and More (9 May 2006)

Counting Parents and Children with Count Distinct (10 January 2006)

Other Recent Forum Posts

Using case statement in where clause (12h)

Combine records based on date in between date columns as a continuous single record (1d)

Need Help in SQL query optimization (4d)

Info needed on field "user_scans" of table "sys.dm_db_index_usage_stats" (4d)

Suspect Parameter Sniffing? (5d)

See values of a proc when it is called from another proc? (5d)

Varchar() datatype and C# SqlDataReader truncates string (6d)

Find the extra rows (6d)

- Advertisement -