SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Calculating Running Totals
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

AskSQLTeam
Ask SQLTeam Question

USA
0 Posts

Posted - 04/30/2001 :  10:51:49  Show Profile  Visit AskSQLTeam's Homepage  Reply with Quote
Roger writes "Is there a way to retrive a field value from the previously read row in order to use it to calculate a field in the current row . . ."

Article Link.

Merkin
Funky Drop Bear Fearing SQL Dude!

Australia
4970 Posts

Posted - 05/08/2001 :  01:26:54  Show Profile  Visit Merkin's Homepage  Reply with Quote
Interesting

Hmmmm

Very interesting about the cursor being quicker. Just proves that there never is one hard and fast rule for anything.

Go to Top of Page

JustinBigelow
SQL Gigolo

USA
1157 Posts

Posted - 05/08/2001 :  11:28:42  Show Profile  Reply with Quote
Why the need for coalesce?

When I started to read the article I decided to try my own solution first and compare it to what the author came up with. My solution is as follows:

select s1.daycount, s1.sales,
RunningTotal = (select sum(s2.sales) from sales s2
where s2.daycount <= s1.daycount)
from sales s1
orderby s1.daycount, s1.sales

I came up with this after refreshing myself on correlated sub queries in "Professional SQL Server 7.0 Programming" so I guess you could call this the "Wrox Solution" ;)

This is pretty close to the "Celko" solution so I timed both on QA and the Wrox beat the Celko by a whopping ONE SECOND (heavy sarcasm)! I don't understand the need for the coalesce function, they both seem to produce the same results. Would somebody mind enlightening me? Thanks.

Justin

Go to Top of Page

Roger
Starting Member

0 Posts

Posted - 05/08/2001 :  12:01:21  Show Profile  Reply with Quote
Not exactly what I was looking for

When I asked about the cumulative total it was just my attempt to make a simple example where I would need to keep values from the previous row in order to use them in the next row. The actual calculation that I need to do is quite a bit more complicated.

This is my query using MySQL:
SET @f:=0
SET @t:=0
SET @s:=0
SET @cf:=0

select day,
@cf:=0.4 * @s + (0.6*(@f+@t)) as forcast,
@t:= 0.2 * (@cf - @f) + (0.8 * @t) as trend,
@s:= sum(total) as total,
@f:= if(@cf = 0,sum(total),@cf) as hold
from MyTable
where tag = 1
and crcid = 1876078342
And day between 10995 and 11140
group by day;

MySQL allows variables to be set dynamicaly which SQL server does not.
I need to know how to get the same results using SQL Server.

Go to Top of Page

colinm
Yak Posting Veteran

United Kingdom
62 Posts

Posted - 05/10/2001 :  12:04:33  Show Profile  Reply with Quote
Another way

Would be to use a trigger, and calculate the total each time a row is inserted. Then you would always have the cummulative total. But it would slow down inserts slightly.

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 05/17/2001 :  19:25:09  Show Profile  Visit nr's Homepage  Reply with Quote
you can update rows from the previous row by something like (put in your own calculation)

declare @s varchar(1000)
update tbl
set @s = x = coalesce(@s, x)

unfortunately update doesn't allow an order by so not very useful.




Edited by - nr on 05/17/2001 19:27:01
Go to Top of Page

richardd
Starting Member

United Kingdom
3 Posts

Posted - 05/30/2001 :  04:53:31  Show Profile  Visit richardd's Homepage  Reply with Quote
Try this:

CREATE TABLE #Sales (DayCount smallint, Sales money, RunningTotal money)

INSERT INTO #Sales (DayCount, Sales, RunningTotal)
SELECT DayCount, Sales, 0 FROM Sales
ORDER BY DayCount

DECLARE @RunningTotal money
SET @RunningTotal = 0

UPDATE #Sales
SET @RunningTotal = RunningTotal = @RunningTotal + Sales

SELECT * FROM #Sales ORDER BY DayCount

DROP TABLE #Sales

In SQL Server 2000, use a table variable instead of the temp table.

Go to Top of Page

mono
Starting Member

United Kingdom
36 Posts

Posted - 09/05/2001 :  10:54:39  Show Profile  Reply with Quote
The reason for the performance difference is that the Celko solution and it's variants (all the v1 < v2 ones) are quadratic. To compute the total for the Nth rcord, it computes the total of the N-1 records below. When it moves to computing the next total, it redoes all this work and add the Nth value. On average, each total requires N/2 records to be accessed. Do it for a dataset twice the size and you should see ti take about 4 times as long. Not acceptable where thevolume of data can increase significantly over time.

The cursor appraoch visits each record just once.

The update approach is interesting. The only problem is ensuring that the records come in the right order. I am considering copying the data in a temp table with a clustered index on the ordered field to force the records into the required order but the docs do not guarantee the update will take the records in clustered index order. And you can't have an ORDER BY in an update.


quote:

Interesting <P>Hmmmm

Very interesting about the cursor being quicker. Just proves that there never is one hard and fast rule for anything.





Go to Top of Page

julianhaynes
Starting Member

United Kingdom
3 Posts

Posted - 02/21/2002 :  12:40:23  Show Profile  Reply with Quote
I prefer to avoid CROSS joins...

SELECT
a.DayCount
, a.Sales
, SUM(b.Sales) AS RunningTotal
FROM
Sales a
JOIN Sales b ON b.DayCount <= a.DayCount
GROUP BY
a.DayCount
, a.Sales
ORDER BY
a.DayCount
, a.Sales

... but of course this is the same...
Scan count 5000, logical reads 37544, physical reads 0, read-ahead reads 0.

Go to Top of Page

mschweisguth
Starting Member

1 Posts

Posted - 05/12/2002 :  18:57:04  Show Profile  Reply with Quote
what I am thinking is: you can use "order by" on a read-only cursor and then insert the results into a new temp table.

quote:

The reason for the performance difference is that the Celko solution and it's variants (all the v1 < v2 ones) are quadratic. To compute the total for the Nth rcord, it computes the total of the N-1 records below. When it moves to computing the next total, it redoes all this work and add the Nth value. On average, each total requires N/2 records to be accessed. Do it for a dataset twice the size and you should see ti take about 4 times as long. Not acceptable where thevolume of data can increase significantly over time.

The cursor appraoch visits each record just once.

The update approach is interesting. The only problem is ensuring that the records come in the right order. I am considering copying the data in a temp table with a clustered index on the ordered field to force the records into the required order but the docs do not guarantee the update will take the records in clustered index order. And you can't have an ORDER BY in an update.


quote:

Interesting <P>Hmmmm

Very interesting about the cursor being quicker. Just proves that there never is one hard and fast rule for anything.









Go to Top of Page

richardd
Starting Member

United Kingdom
3 Posts

Posted - 05/13/2002 :  05:28:23  Show Profile  Visit richardd's Homepage  Reply with Quote
quote:
You received the following message from mschweisguth:

I was looking at your SQL. I once did something like this, but, I was afraid that it was undocumented behavior.

Primarily since I remembered reading something, somewhere, that said that even if you add a clustered index to a table, a select * statement isn't gaurenteed to return ordered output unless you use an order by clause.

Do you know if this is specified someplace in SQL92?

-thanks.


Michael,

I have no idea! I just know it works in MS SQL, and I don't use anything else. Maybe someone else can help.

Cheers,
Richard
Go to Top of Page

merry
Starting Member

1 Posts

Posted - 05/15/2003 :  18:50:20  Show Profile  Reply with Quote
Hi , i am a self taught sql server developer.
I am so amazed at the cool solutions that you guys
throw out for reasonably can-be-complex problems.

Can you please share what makes one comfortably write
down the queries ? is it just common sense or is it
intelligence or is it remembering some basic sql tricks and
building on that or ..?

I bet none of these cool solutions that i see on this site,
arrived in the author's mind as they are seen here.
what is the method or mind-set one should
be in, to turn ideas to sql queries ?

show me light.
sorry if i my attempt to not sound desperate failed in this email.
but I want to be good at the art/

send the email straight to raju_sql@yahoo.com.

Go to Top of Page

AjarnMark
SQL Slashing Gunting Master

USA
3246 Posts

Posted - 05/19/2003 :  14:24:19  Show Profile  Visit AjarnMark's Homepage  Reply with Quote
Merry, I'd say the bulk of it comes from "years of experience". I know that's not the answer most people want to hear, but the fact is that the more you work with SQL, the more you start thinking in terms of set-based operations. And, the more you experience you have, the more often you can say to yourself, "Now, I remember something kind of like this a while back, and we did something like ____." Then you test it, massage it, test it again, etc.

Of course, reading forums like this one, reading the articles posted here, and getting your hands on a couple of good books help lay a solid foundation. But nothing is quite as valuable as experience.

------------------------------------------------------
The more you know, the more you know you don't know.
Go to Top of Page

GreySky
Starting Member

USA
20 Posts

Posted - 05/19/2003 :  15:48:14  Show Profile  Reply with Quote
I was wondering if SQL Server 2000 *functions* can be used that emulate the in-line functions I describe below.

---------------------
In the MS Access Newsgroup (the one sponsored by Microsoft), there are often questions on how to perform "cumulative" whatevers.

As *all* are aware, MS Access doesn't have the Stored Procedure capabilities SQL Server has. However, Jet is extended to allow the use of *functions* that greatly enhance query capabilities.

The "Celko" method is commonly used in Access. However, it has major performance issues when large datasets are involved, and of course isn't updatable.

The most efficient method for calculating in-line values for *very large* recordsets in MS Access is to have code/SQL that:

1) resets hold values used by inline functions;

2) manages how the Cartesian Product Table is built within MS Access (either by TOP 100 PERCENT clauses within subqueries or managed CPT builds by using base tables whose joins dictate the CPT build order); and

3) executes a query that calls an in-line function (described below).

These steps are a modified approach to the explanation below, as the query is initiated from code instead of code initiated by the query.

Below is how to open queries that then perform cumulative functions in MS Access. While written for MS Access people, it still contains a few ideas not discussed here.

---------------CUMULATIVES FOR MS ACCESS -------------------
There are essentially two competing methods to accomplish accumulations by performing line-by-line analysis, each with their advantages and drawbacks. The two methods are (a) queries, and (b) functions.

Questions that determine the appropriate answer:

-- How many records will be displayed as a result? 100 or fewer? Thousands?

-- Will the query need to be updatable?

-- Will the user scroll through these records?

-- Will the user need to make changes and have the calculations immediately update?

-- Will the user reorder the records, thus requiring the recalculation of the cumulative totals?

-- Are there groupings? In other words, will you be looking for the last cumulative value after Group A, then after Group B, etc.?

------------

An overview of the competing methodologies:

tblOrderItems
--OrderItemID - AutoNumber
--OrderItem - Text 255
--OrderValue - Currency

Version 1: Queries Only

There are a few ways of implementing this as a subquery. Here, I provide an example of using a subquery within the select statement to return the cumulative values up to that point.

SELECT MainOrderItems.OrderItemID, MainOrderItems.OrderValue, (Select Sum([OrderValue]) From tblOrderItems As SubQuery Where SubQuery.[OrderItemID] <= MainOrderItems.OrderItemID) AS Cumulative
FROM tblOrderItems AS MainOrderItems
ORDER BY MainOrderItems.OrderItemID;

Advantages:
--This version is fully encapsulated within SQL
--SQL geeks love it because it makes them feel uber and reinforces their opinion that code is for the uninitiated

Disadvantages:
--Doesn't allow for updates*
--The displayed sort order must be managed
--Don't try this with thousands of records if you want an answer anytime soon

*OK, OK, so there IS a way to make the results updatable. Use DSum in the select statement instead of a subquery. For example:

SELECT tblOrderItems.*, nz(DSum("[OrderValue]","tblOrderItems"," [OrderItemID]<" & [tblOrderItems].[OrderItemID]))+[OrderValue] AS Cumulative
FROM tblOrderItems;

This combination allows for the cumulative action to work while maintaining its updatability capabilities. AND, the numbers will automatically recalculate if you make a change. If you don't want the automatic recalculation, you can use a DLookup against the first query that uses the subquery.

Version 2: Functions

There are two distinct ways to utilize functions. I break them down into (a) inline functions and (b) non-inline functions, or in other words: (a) hard but blazingly fast and (b) somewhat easy and flexible but slow.

2(a) Inline functions take the current value of the record, perform a mathematic (or other) function, "stores" the result of calculation, and return the answer for display.

Inline functions use the "stored" value for next call. Inline functions can be used in Queries, Forms, and Reports.

Advantages:
--Absolute performance screamers. Inline functions are hands-down the fastest executors.
--Able to easily handle the largest record counts; linear time requirement growth

Disadvantages:
--Hardest to manage*
--Is best used in conjunction with Aggregate functions, such as Last, with or without Group By.**
--Not designed for scrolling, but you can do it if you really have to***

*Inline functions require the additional requirement of resetting the "stored" value from execution to exection. Within queries, this can be achieved by having one query return the first record, and if the record being executed is the first record, then it resets the stored value. For reports, it can be reset with the Report_Open. For forms, it usually isn't best to use use inline functions -- use non-inline functions or queries.

**Inline functions are superior at returning an aggregate result based upon line-by-line analysis in an incredibly fast manner (quite possibly unbeatable speed-wise, even when compared against code using cursorvs). Essentially, you create a query that uses the inline function. Then you create another query that performs the aggregation against the first query.

If function execution order is important, there's an important caveat. The query that runs the inline function must be based upon a query that includes the SELECT TOP 100 PERCENT ... keywords. "TOP 100 PERCENT" ensures that your stated Order By is performed against the subquery. This ensures the inline function is called in the exact order you demand (such as forcing groups to execute in order). Then aggregate query can simply take the "last" of the inline query to return the blitzkrieg mathematic results.

Note: Cumulative multiplication in conjunction with aggregate functions does not require a function. You can use an expession similar to this one: IIf(Sum(IIf([Value]<0,1,0)) Mod 2=1,-1,1)*Sgn(Min(Abs([Value])))*Exp(Sum(Log(Abs(IIf([Value]=0,1,[Value])))))

***While technically the results can be set to be updatable, it just doesn't work. This is because the function is called as the user scrolls through the recordset for the row the user is on. This wreaks absolute havok with "stored" variables, and will return unexpected values. If you are going to scroll the results, you *should* set the result type to Snapshot.

If in-line functions must be used with a scrollable and updatable recordset, then you must implement methods that dramatically hamper performance. This is because you have to forget about using stored values, and dynamically calculate the proper result based upon the users position within the recordset.

Here's an example of an in-line function. This function performs a cumulative multiply, and resets the value when a new CoGroupID is passed. (In the real world, this should be performed using the expression I listed above; this is for illustration only.)

Function InlineMultiply(lngInlineValue As Long, bInitialize As Boolean, lngCoGroupID As Long) As Long
On Error GoTo Err_Overflow

Static lngHoldValue As Long
Static lngHoldCoGroup As Long
Static lngHoldCompanyID As Long

' determine if to initialize
If bInitialize Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
lngHoldCompanyID = 0
End If

' determine if a different co group passed (requires resetting)
If lngHoldCoGroup <> lngCoGroupID Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
End If

' return the hold value * inline value
lngHoldValue = lngHoldValue * lngInlineValue
InlineMultiply = lngHoldValue

Exit_InlineMultiply:
Exit Function

Err_Overflow:
InlineMultiply = 0
lngHoldValue = 1
lngHoldCoGroup = 0
Resume Exit_InlineMultiply

End Function

The first query involved utilizes the TOP 100 PERCENT keywords to force the appropriate execution order of the function. Otherwise, there would be no guarantee that all CoGroupID 1's would execute before any CoGroupID 2's were passed (thus resetting the value prematurely).

SELECT TOP 100 PERCENT tblCompanies.*
FROM tblCompanies
WHERE tblCompanies.Value Is Not Null
ORDER BY tblCompanies.CoGroupID, tblCompanies.CompanyName;
|| note this Order By CoGroupID
|| in conjunction with TOP 100 PERCENT

--This query forces the Order By to be performed, thanks to the TOP 100 PERCENT keywords. This is then used as a subquery in the next query that actually calls the inline function:

SELECT qrySortedCompanyData.CompanyID, qrySortedCompanyData.CoGroupID, qrySortedCompanyData.CompanyName, qrySortedCompanyData.Value, InlineMultiply([Value],[qrySortedCompanyData].[CompanyID]=[qryFirstCompany].[CompanyID],[qrySortedCompanyData].[CoGroupID]) AS TheInline
FROM qrySortedCompanyData, qryFirstCompany
ORDER BY qrySortedCompanyData.CoGroupID, qrySortedCompanyData.CompanyName;

--This query can then be used to make summaries:

SELECT qryRunningMultiply.CoGroupID, Last(qryRunningMultiply.TheInline) AS LastValue
FROM qryRunningMultiply
GROUP BY qryRunningMultiply.CoGroupID;

-- This summary shows the cumulative value for each group.

2(b) Non-Inline Functions.

Non-inline functions are chiefly designed for use on forms. They take the recordsetclone and a bookmark, and perform their calculations against that recordset.

Advantages:
--Fully updatable recordsets
--Immediate reculations
--Easy to write and maintain
--The function uses the current sort, even if the user changes it

Disadvantages:
--Not exactly speedy
--Not for thousands of records

The following in-line function example returns the row number:

-TextBox controlsource:
=IIf(IsNull([TheValue]),Null,LineNumber([RecordsetClone],[Bookmark]))

-Non-inline Function:
Function LineNumber(rsc As Recordset, strBookmark As String) As Long

' align to the current bookmark and return its absolute position
rsc.Bookmark = strBookmark
LineNumber = rsc.AbsolutePosition + 1

End Function

This next example returns the cumulative sum of values showing on the datasheet.

-Textbox controlsource:
=IIf(IsNull([TheValue]),Null,CumulativeValues([RecordsetClone],[Bookmark],"TheValue"))

-Non-inline function:
Function CumulativeValues(rsc As Recordset, strBookmark As String, strFieldName As String) As Long

Dim lngTotal As Long

' align to the current bookmark
rsc.Bookmark = strBookmark

' move previous until bof encountered
While Not rsc.BOF
lngTotal = lngTotal + rsc.Fields(strFieldName)
rsc.MovePrevious
Wend

' return the value
CumulativeValues = lngTotal

End Function


----------
David Atkins, MCP

Go to Top of Page

vhyn99
Starting Member

Philippines
3 Posts

Posted - 04/24/2004 :  11:07:51  Show Profile  Send vhyn99 a Yahoo! Message  Reply with Quote
nice code...it helps

//vhyn99
Go to Top of Page

objectuser
Starting Member

14 Posts

Posted - 07/22/2004 :  08:46:11  Show Profile  Reply with Quote
I've recently needed to have running totals in an OLTP application. I was concerned with the performance, but it was a tradeoff in performance and redundant data. I was able to get the #2 query above to perform satisfactorily by limiting the range of rows.

I was not able to verify the results posted in the original reply. I found that the first solution (the temporary table) performed the worst of the bunch. On my laptop, it took 36 seconds. This compares to about 7 seconds with the correlated subquery.

As an aside, does anyone know if SQL Server will be adding analytic functions like Oracle has? This is a much better solution to this problem (performance wise). I do work in both SQL Server and Oracle, so I ran this test on Oracle (again running on my laptop). For the correlated subquery solution, the results were similar--about 7s. However, when I used Oracle's analytic functions, they results went down two orders of magnitude to about 70ms. Here's how it looks:

select day_count, sales,
sum(sales) over (order by day_count range unbounded preceding) running_total
from sales
order by day_count

I use things kinds of things all of the time when I'm working with Oracle--they are very useful. I'm hoping that SQL Server will add something analogous, soon.

--Kevin
Go to Top of Page

parevr23
Starting Member

4 Posts

Posted - 04/17/2005 :  01:42:07  Show Profile  Reply with Quote
quote:
Originally posted by JustinBigelow

Why the need for coalesce? <P>When I started to read the article I decided to try my own solution first and compare it to what the author came up with. My solution is as follows:

select s1.daycount, s1.sales,
RunningTotal = (select sum(s2.sales) from sales s2
where s2.daycount <= s1.daycount)
from sales s1
orderby s1.daycount, s1.sales

I came up with this after refreshing myself on correlated sub queries in "Professional SQL Server 7.0 Programming" so I guess you could call this the "Wrox Solution" ;)

This is pretty close to the "Celko" solution so I timed both on QA and the Wrox beat the Celko by a whopping ONE SECOND (heavy sarcasm)! I don't understand the need for the coalesce function, they both seem to produce the same results. Would somebody mind enlightening me? Thanks.

Justin




Hey, I found this page on google and I find it very intresting, you guys are great!
Alone the same lines, I need some help with a running total issue. I have a table that has a list of transaccions, not a huge table, there will probably be a couple 100 records a month or so, and I am not sure whether or not to calculate a running total and save it in the table for each transaction or run a query to get the date every time. What are the advantages?
My table looks something like this: Transaction(ID, Date, Description, Amunt, Balance)
The issue I run into is that transactions will not come in the order of inserted but by the field date. I need to calculate the total based on the day in field Date not by the day the record was entered.
Go to Top of Page

bojan.jovicic
Starting Member

Yugoslavia
1 Posts

Posted - 03/28/2006 :  06:13:52  Show Profile  Visit bojan.jovicic's Homepage  Send bojan.jovicic an ICQ Message  Click to see bojan.jovicic's MSN Messenger address  Reply with Quote
quote:
Originally posted by mono

The cursor appraoch visits each record just once.



The update approach is interesting. The only problem is ensuring that the records come in the right order. I am considering copying the data in a temp table with a clustered index on the ordered field to force the records into the required order but the docs do not guarantee the update will take the records in clustered index order. And you can't have an ORDER BY in an update.






I have found another article discussing exactly that solution: http://www.sql-server-performance.com/mm_cursor_friendly_problem.asp

It looks like that update order is guarenteed with techniques used there.

--
Bojan Jovicic
B. Sc. IT Engineer
Microsoft Certified Solution Developer
Microsoft Business Solutions Certified Professional

Edited by - bojan.jovicic on 03/28/2006 07:13:13
Go to Top of Page

stevekgoodwin
Starting Member

3 Posts

Posted - 02/26/2007 :  01:25:42  Show Profile  Reply with Quote
Nice article. There's a bug in the CROSS JOIN example. The phrase "AS Running Total" needs to be cut'n'pasted 3 lines higher so it reads:

SUM(Sales) As RunningTotal
Go to Top of Page

ismailc
Constraint Violating Yak Guru

South Africa
286 Posts

Posted - 01/05/2009 :  08:22:26  Show Profile  Reply with Quote
Great Article

Is there a way of reseting the running total to 0 based on a certain criteria and continue calculating.

When the date is ist of the month (Split_Date like '01%') then set Accum_Tot to 0.

Regards
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30105 Posts

Posted - 01/05/2009 :  09:02:50  Show Profile  Visit SwePeso's Homepage  Reply with Quote
As suggested before, ismailc, see http://weblogs.sqlteam.com/peterl/archive/2008/11/26/Finding-streaks-in-data.aspx



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.19 seconds. Powered By: Snitz Forums 2000