Article: Calculating Running Totals

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

Article Discussion

Article: Calculating Running Totals

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2001-04-30 : 10:51:49

Roger writes "Is there a way to retrive a field value from the previously read row in order to use it to calculate a field in the current row . . ."

Article Link.

julianhaynes
Starting Member

3 Posts

Posted - 2002-02-21 : 12:40:23

I prefer to avoid CROSS joins...

SELECT
a.DayCount
, a.Sales
, SUM(b.Sales) AS RunningTotal
FROM
Sales a
JOIN Sales b ON b.DayCount <= a.DayCount
GROUP BY
a.DayCount
, a.Sales
ORDER BY
a.DayCount
, a.Sales

... but of course this is the same...
Scan count 5000, logical reads 37544, physical reads 0, read-ahead reads 0.

mschweisguth
Starting Member

1 Post

Posted - 2002-05-12 : 18:57:04

what I am thinking is: you can use "order by" on a read-only cursor and then insert the results into a new temp table.

quote:

The reason for the performance difference is that the Celko solution and it's variants (all the v1 < v2 ones) are quadratic. To compute the total for the Nth rcord, it computes the total of the N-1 records below. When it moves to computing the next total, it redoes all this work and add the Nth value. On average, each total requires N/2 records to be accessed. Do it for a dataset twice the size and you should see ti take about 4 times as long. Not acceptable where thevolume of data can increase significantly over time.

The cursor appraoch visits each record just once.

The update approach is interesting. The only problem is ensuring that the records come in the right order. I am considering copying the data in a temp table with a clustered index on the ordered field to force the records into the required order but the docs do not guarantee the update will take the records in clustered index order. And you can't have an ORDER BY in an update.

quote:

Interesting <P>Hmmmm

Very interesting about the cursor being quicker. Just proves that there never is one hard and fast rule for anything.

richardd
Starting Member

3 Posts

Posted - 2002-05-13 : 05:28:23

quote:
You received the following message from mschweisguth:

I was looking at your SQL. I once did something like this, but, I was afraid that it was undocumented behavior.

Primarily since I remembered reading something, somewhere, that said that even if you add a clustered index to a table, a select * statement isn't gaurenteed to return ordered output unless you use an order by clause.

Do you know if this is specified someplace in SQL92?

-thanks.

Michael,

I have no idea! I just know it works in MS SQL, and I don't use anything else. Maybe someone else can help.

Cheers,
Richard

merry
Starting Member

1 Post

Posted - 2003-05-15 : 18:50:20

Hi , i am a self taught sql server developer.
I am so amazed at the cool solutions that you guys
throw out for reasonably can-be-complex problems.

Can you please share what makes one comfortably write
down the queries ? is it just common sense or is it
intelligence or is it remembering some basic sql tricks and
building on that or ..?

I bet none of these cool solutions that i see on this site,
arrived in the author's mind as they are seen here.
what is the method or mind-set one should
be in, to turn ideas to sql queries ?

show me light.
sorry if i my attempt to not sound desperate failed in this email.
but I want to be good at the art/

send the email straight to raju_sql@yahoo.com.

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2003-05-19 : 14:24:19

Merry, I'd say the bulk of it comes from "years of experience". I know that's not the answer most people want to hear, but the fact is that the more you work with SQL, the more you start thinking in terms of set-based operations. And, the more you experience you have, the more often you can say to yourself, "Now, I remember something kind of like this a while back, and we did something like ____." Then you test it, massage it, test it again, etc.

Of course, reading forums like this one, reading the articles posted here, and getting your hands on a couple of good books help lay a solid foundation. But nothing is quite as valuable as experience.

------------------------------------------------------
The more you know, the more you know you don't know.

GreySky
Starting Member

20 Posts

Posted - 2003-05-19 : 15:48:14

I was wondering if SQL Server 2000 *functions* can be used that emulate the in-line functions I describe below.

---------------------
In the MS Access Newsgroup (the one sponsored by Microsoft), there are often questions on how to perform "cumulative" whatevers.

As *all* are aware, MS Access doesn't have the Stored Procedure capabilities SQL Server has. However, Jet is extended to allow the use of *functions* that greatly enhance query capabilities.

The "Celko" method is commonly used in Access. However, it has major performance issues when large datasets are involved, and of course isn't updatable.

The most efficient method for calculating in-line values for *very large* recordsets in MS Access is to have code/SQL that:

1) resets hold values used by inline functions;

2) manages how the Cartesian Product Table is built within MS Access (either by TOP 100 PERCENT clauses within subqueries or managed CPT builds by using base tables whose joins dictate the CPT build order); and

3) executes a query that calls an in-line function (described below).

These steps are a modified approach to the explanation below, as the query is initiated from code instead of code initiated by the query.

Below is how to open queries that then perform cumulative functions in MS Access. While written for MS Access people, it still contains a few ideas not discussed here.

---------------CUMULATIVES FOR MS ACCESS -------------------
There are essentially two competing methods to accomplish accumulations by performing line-by-line analysis, each with their advantages and drawbacks. The two methods are (a) queries, and (b) functions.

Questions that determine the appropriate answer:

-- How many records will be displayed as a result? 100 or fewer? Thousands?

-- Will the query need to be updatable?

-- Will the user scroll through these records?

-- Will the user need to make changes and have the calculations immediately update?

-- Will the user reorder the records, thus requiring the recalculation of the cumulative totals?

-- Are there groupings? In other words, will you be looking for the last cumulative value after Group A, then after Group B, etc.?

------------

An overview of the competing methodologies:

tblOrderItems
--OrderItemID - AutoNumber
--OrderItem - Text 255
--OrderValue - Currency

Version 1: Queries Only

There are a few ways of implementing this as a subquery. Here, I provide an example of using a subquery within the select statement to return the cumulative values up to that point.

SELECT MainOrderItems.OrderItemID, MainOrderItems.OrderValue, (Select Sum([OrderValue]) From tblOrderItems As SubQuery Where SubQuery.[OrderItemID] <= MainOrderItems.OrderItemID) AS Cumulative
FROM tblOrderItems AS MainOrderItems
ORDER BY MainOrderItems.OrderItemID;

Advantages:
--This version is fully encapsulated within SQL
--SQL geeks love it because it makes them feel uber and reinforces their opinion that code is for the uninitiated

Disadvantages:
--Doesn't allow for updates*
--The displayed sort order must be managed
--Don't try this with thousands of records if you want an answer anytime soon

*OK, OK, so there IS a way to make the results updatable. Use DSum in the select statement instead of a subquery. For example:

SELECT tblOrderItems.*, nz(DSum("[OrderValue]","tblOrderItems"," [OrderItemID]<" & [tblOrderItems].[OrderItemID]))+[OrderValue] AS Cumulative
FROM tblOrderItems;

This combination allows for the cumulative action to work while maintaining its updatability capabilities. AND, the numbers will automatically recalculate if you make a change. If you don't want the automatic recalculation, you can use a DLookup against the first query that uses the subquery.

Version 2: Functions

There are two distinct ways to utilize functions. I break them down into (a) inline functions and (b) non-inline functions, or in other words: (a) hard but blazingly fast and (b) somewhat easy and flexible but slow.

2(a) Inline functions take the current value of the record, perform a mathematic (or other) function, "stores" the result of calculation, and return the answer for display.

Inline functions use the "stored" value for next call. Inline functions can be used in Queries, Forms, and Reports.

Advantages:
--Absolute performance screamers. Inline functions are hands-down the fastest executors.
--Able to easily handle the largest record counts; linear time requirement growth

Disadvantages:
--Hardest to manage*
--Is best used in conjunction with Aggregate functions, such as Last, with or without Group By.**
--Not designed for scrolling, but you can do it if you really have to***

*Inline functions require the additional requirement of resetting the "stored" value from execution to exection. Within queries, this can be achieved by having one query return the first record, and if the record being executed is the first record, then it resets the stored value. For reports, it can be reset with the Report_Open. For forms, it usually isn't best to use use inline functions -- use non-inline functions or queries.

**Inline functions are superior at returning an aggregate result based upon line-by-line analysis in an incredibly fast manner (quite possibly unbeatable speed-wise, even when compared against code using cursorvs). Essentially, you create a query that uses the inline function. Then you create another query that performs the aggregation against the first query.

If function execution order is important, there's an important caveat. The query that runs the inline function must be based upon a query that includes the SELECT TOP 100 PERCENT ... keywords. "TOP 100 PERCENT" ensures that your stated Order By is performed against the subquery. This ensures the inline function is called in the exact order you demand (such as forcing groups to execute in order). Then aggregate query can simply take the "last" of the inline query to return the blitzkrieg mathematic results.

Note: Cumulative multiplication in conjunction with aggregate functions does not require a function. You can use an expession similar to this one: IIf(Sum(IIf([Value]<0,1,0)) Mod 2=1,-1,1)*Sgn(Min(Abs([Value])))*Exp(Sum(Log(Abs(IIf([Value]=0,1,[Value])))))

***While technically the results can be set to be updatable, it just doesn't work. This is because the function is called as the user scrolls through the recordset for the row the user is on. This wreaks absolute havok with "stored" variables, and will return unexpected values. If you are going to scroll the results, you *should* set the result type to Snapshot.

If in-line functions must be used with a scrollable and updatable recordset, then you must implement methods that dramatically hamper performance. This is because you have to forget about using stored values, and dynamically calculate the proper result based upon the users position within the recordset.

Here's an example of an in-line function. This function performs a cumulative multiply, and resets the value when a new CoGroupID is passed. (In the real world, this should be performed using the expression I listed above; this is for illustration only.)

Function InlineMultiply(lngInlineValue As Long, bInitialize As Boolean, lngCoGroupID As Long) As Long
On Error GoTo Err_Overflow

Static lngHoldValue As Long
Static lngHoldCoGroup As Long
Static lngHoldCompanyID As Long

' determine if to initialize
If bInitialize Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
lngHoldCompanyID = 0
End If

' determine if a different co group passed (requires resetting)
If lngHoldCoGroup <> lngCoGroupID Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
End If

' return the hold value * inline value
lngHoldValue = lngHoldValue * lngInlineValue
InlineMultiply = lngHoldValue

Exit_InlineMultiply:
Exit Function

Err_Overflow:
InlineMultiply = 0
lngHoldValue = 1
lngHoldCoGroup = 0
Resume Exit_InlineMultiply

End Function

The first query involved utilizes the TOP 100 PERCENT keywords to force the appropriate execution order of the function. Otherwise, there would be no guarantee that all CoGroupID 1's would execute before any CoGroupID 2's were passed (thus resetting the value prematurely).

SELECT TOP 100 PERCENT tblCompanies.*
FROM tblCompanies
WHERE tblCompanies.Value Is Not Null
ORDER BY tblCompanies.CoGroupID, tblCompanies.CompanyName;
|| note this Order By CoGroupID
|| in conjunction with TOP 100 PERCENT

--This query forces the Order By to be performed, thanks to the TOP 100 PERCENT keywords. This is then used as a subquery in the next query that actually calls the inline function:

SELECT qrySortedCompanyData.CompanyID, qrySortedCompanyData.CoGroupID, qrySortedCompanyData.CompanyName, qrySortedCompanyData.Value, InlineMultiply([Value],[qrySortedCompanyData].[CompanyID]=[qryFirstCompany].[CompanyID],[qrySortedCompanyData].[CoGroupID]) AS TheInline
FROM qrySortedCompanyData, qryFirstCompany
ORDER BY qrySortedCompanyData.CoGroupID, qrySortedCompanyData.CompanyName;

--This query can then be used to make summaries:

SELECT qryRunningMultiply.CoGroupID, Last(qryRunningMultiply.TheInline) AS LastValue
FROM qryRunningMultiply
GROUP BY qryRunningMultiply.CoGroupID;

-- This summary shows the cumulative value for each group.

2(b) Non-Inline Functions.

Non-inline functions are chiefly designed for use on forms. They take the recordsetclone and a bookmark, and perform their calculations against that recordset.

Advantages:
--Fully updatable recordsets
--Immediate reculations
--Easy to write and maintain
--The function uses the current sort, even if the user changes it

Disadvantages:
--Not exactly speedy
--Not for thousands of records

The following in-line function example returns the row number:

-TextBox controlsource:
=IIf(IsNull([TheValue]),Null,LineNumber([RecordsetClone],[Bookmark]))

-Non-inline Function:
Function LineNumber(rsc As Recordset, strBookmark As String) As Long

' align to the current bookmark and return its absolute position
rsc.Bookmark = strBookmark
LineNumber = rsc.AbsolutePosition + 1

End Function

This next example returns the cumulative sum of values showing on the datasheet.

-Textbox controlsource:
=IIf(IsNull([TheValue]),Null,CumulativeValues([RecordsetClone],[Bookmark],"TheValue"))

-Non-inline function:
Function CumulativeValues(rsc As Recordset, strBookmark As String, strFieldName As String) As Long

Dim lngTotal As Long

' align to the current bookmark
rsc.Bookmark = strBookmark

' move previous until bof encountered
While Not rsc.BOF
lngTotal = lngTotal + rsc.Fields(strFieldName)
rsc.MovePrevious
Wend

' return the value
CumulativeValues = lngTotal

End Function

----------
David Atkins, MCP

vhyn99
Starting Member

3 Posts

Posted - 2004-04-24 : 11:07:51

nice code...it helps

//vhyn99

objectuser
Starting Member

14 Posts

Posted - 2004-07-22 : 08:46:11

I've recently needed to have running totals in an OLTP application. I was concerned with the performance, but it was a tradeoff in performance and redundant data. I was able to get the #2 query above to perform satisfactorily by limiting the range of rows.

I was not able to verify the results posted in the original reply. I found that the first solution (the temporary table) performed the worst of the bunch. On my laptop, it took 36 seconds. This compares to about 7 seconds with the correlated subquery.

As an aside, does anyone know if SQL Server will be adding analytic functions like Oracle has? This is a much better solution to this problem (performance wise). I do work in both SQL Server and Oracle, so I ran this test on Oracle (again running on my laptop). For the correlated subquery solution, the results were similar--about 7s. However, when I used Oracle's analytic functions, they results went down two orders of magnitude to about 70ms. Here's how it looks:

select day_count, sales,
sum(sales) over (order by day_count range unbounded preceding) running_total
from sales
order by day_count

I use things kinds of things all of the time when I'm working with Oracle--they are very useful. I'm hoping that SQL Server will add something analogous, soon.

--Kevin

parevr23
Starting Member

4 Posts

Posted - 2005-04-17 : 01:42:07

quote:
Originally posted by JustinBigelow

Why the need for coalesce? <P>When I started to read the article I decided to try my own solution first and compare it to what the author came up with. My solution is as follows:

select s1.daycount, s1.sales,
RunningTotal = (select sum(s2.sales) from sales s2
where s2.daycount <= s1.daycount)
from sales s1
orderby s1.daycount, s1.sales

I came up with this after refreshing myself on correlated sub queries in "Professional SQL Server 7.0 Programming" so I guess you could call this the "Wrox Solution" ;)

This is pretty close to the "Celko" solution so I timed both on QA and the Wrox beat the Celko by a whopping ONE SECOND (heavy sarcasm)! I don't understand the need for the coalesce function, they both seem to produce the same results. Would somebody mind enlightening me? Thanks.

Justin

Hey, I found this page on google and I find it very intresting, you guys are great!
Alone the same lines, I need some help with a running total issue. I have a table that has a list of transaccions, not a huge table, there will probably be a couple 100 records a month or so, and I am not sure whether or not to calculate a running total and save it in the table for each transaction or run a query to get the date every time. What are the advantages?
My table looks something like this: Transaction(ID, Date, Description, Amunt, Balance)
The issue I run into is that transactions will not come in the order of inserted but by the field date. I need to calculate the total based on the day in field Date not by the day the record was entered.

bojan.jovicic
Starting Member

1 Post

Posted - 2006-03-28 : 06:13:52

quote:
Originally posted by mono

The cursor appraoch visits each record just once.

The update approach is interesting. The only problem is ensuring that the records come in the right order. I am considering copying the data in a temp table with a clustered index on the ordered field to force the records into the required order but the docs do not guarantee the update will take the records in clustered index order. And you can't have an ORDER BY in an update.

I have found another article discussing exactly that solution: http://www.sql-server-performance.com/mm_cursor_friendly_problem.asp

It looks like that update order is guarenteed with techniques used there.

--
Bojan Jovicic
B. Sc. IT Engineer
Microsoft Certified Solution Developer
Microsoft Business Solutions Certified Professional

stevekgoodwin
Starting Member

3 Posts

Posted - 2007-02-26 : 01:25:42

Nice article. There's a bug in the CROSS JOIN example. The phrase "AS Running Total" needs to be cut'n'pasted 3 lines higher so it reads:

SUM(Sales) As RunningTotal

ismailc
Constraint Violating Yak Guru

290 Posts

Posted - 2009-01-05 : 08:22:26

Great Article

Is there a way of reseting the running total to 0 based on a certain criteria and continue calculating.

When the date is ist of the month (Split_Date like '01%') then set Accum_Tot to 0.

Regards

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-01-05 : 09:02:50

As suggested before, ismailc, see http://weblogs.sqlteam.com/peterl/archive/2008/11/26/Finding-streaks-in-data.aspx

E 12°55'05.63"
N 56°04'39.26"

HershelCohen
Starting Member

1 Post

Posted - 2009-03-12 : 12:34:57

[quote]Originally posted by julianhaynes

I prefer to avoid CROSS joins...

SELECT
a.DayCount
, a.Sales
, SUM(b.Sales) AS RunningTotal
FROM
Sales a
JOIN Sales b ON b.DayCount <= a.DayCount
GROUP BY
a.DayCount
, a.Sales
ORDER BY
a.DayCount
, a.Sales

... but of course this is the same...
Scan count 5000, logical reads 37544, physical reads 0, read-ahead reads 0.

This solution worked for me. Thank you, Julian.
-Hershel

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-05-14 : 19:40:37

quote:
Originally posted by HershelCohen

[quote]Originally posted by julianhaynes

I prefer to avoid CROSS joins...

SELECT
a.DayCount
, a.Sales
, SUM(b.Sales) AS RunningTotal
FROM
Sales a
JOIN Sales b ON b.DayCount <= a.DayCount
GROUP BY
a.DayCount
, a.Sales
ORDER BY
a.DayCount
, a.Sales

... but of course this is the same...
Scan count 5000, logical reads 37544, physical reads 0, read-ahead reads 0.

This solution worked for me. Thank you, Julian.
-Hershel

I know this is an old post but I just had to warn you... you have no idea what a mistake you made, Hershel... Please see the following article for why I say so...

[url]http://www.sqlservercentral.com/articles/T-SQL/61539/[/url]

sageora
Starting Member

4 Posts

Posted - 2010-05-17 : 16:12:24

How about using cross apply instead of cross join?
This gives us ability to apply our custom conditions to correlated subquery:

CREATE TABLE [dbo].[Debts](
[id] [int] NULL,
[Value] [float] NULL,
[ag_id] [int] NULL
)

go

insert into dbo.debts(id, ag_id, Value)
values(1, 25, 101.0)
insert into debts(id, ag_id, Value)
values(2, 25, 200.0)
insert into dbo.debts(id, ag_id, Value)
values(3, 35, 1650.0)
insert into dbo.debts(id, ag_id, Value)
values(4, 35, 804.0)
insert into dbo.debts(id, ag_id, Value)
values(5, 35, 525.0)
insert into dbo.debts(id, ag_id, Value)
values(6, 35, 765.0)

go

select o.*, running.*
from dbo.debts o
cross apply(
select SUM(value) sm
from dbo.debts i
where i.id<=o.id --put date field here
and i.ag_id = o.ag_id
) running

Regards

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-05-23 : 13:46:37

quote:
Originally posted by sageora

How about using cross apply instead of cross join?
This gives us ability to apply our custom conditions to correlated subquery:

CREATE TABLE [dbo].[Debts](
[id] [int] NULL,
[Value] [float] NULL,
[ag_id] [int] NULL
)

go

insert into dbo.debts(id, ag_id, Value)
values(1, 25, 101.0)
insert into debts(id, ag_id, Value)
values(2, 25, 200.0)
insert into dbo.debts(id, ag_id, Value)
values(3, 35, 1650.0)
insert into dbo.debts(id, ag_id, Value)
values(4, 35, 804.0)
insert into dbo.debts(id, ag_id, Value)
values(5, 35, 525.0)
insert into dbo.debts(id, ag_id, Value)
values(6, 35, 765.0)

go

select o.*, running.*
from dbo.debts o
cross apply(
select SUM(value) sm
from dbo.debts i
where i.id<=o.id --put date field here
and i.ag_id = o.ag_id
) running

Regards

I haven't tested the particular code above but it looks to me like it has a triangular join in it. See the following article for why that's a bad thing and why it's slower than a cursor and more resource intensive than a cursor.
[url]http://www.sqlservercentral.com/articles/T-SQL/61539/[/url]

--Jeff Moden

"Your lack of planning DOES constitute an emergency on my part... SO PLAN BETTER! "
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row"

beyt34
Starting Member

1 Post

Posted - 2011-01-06 : 03:22:54

another way using with,
ex:

;WITH TmpTable(DayCount, Sales, RunningTotal)
AS
(
SELECT 0 AS DayCount, CAST(0 AS money) AS Sales, CAST(0 AS money) AS RunningTotal
UNION ALL
SELECT tmp.DayCount,
tmp.Sales,
tmp.Sales + t.RunningTotal AS RunningTotal
FROM #tmpTable AS tmp, TmpTable AS t
WHERE (tmp.DayCount = t.DayCount+1)
)
SELECT *
FROM TmpTable
WHERE DayCount>0
OPTION (MAXRECURSION 32767);

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources