Exponential decay coefficient for x,y dataset

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Exponential decay coefficient for x,y dataset

Author

Topic

davidhills
Starting Member

14 Posts

Posted - 2009-08-20 : 11:25:37

Good Morning

Can anyone help me create a table-value function, that I can pass in, a two dimensional dataset and return a best fit exponential decay coeffiecient.

I would like to replicate the LOGEST excel function when used like this in excel

LOGEST({1,2,4,8,16,32},{0,1,2,3,4,5}) = 2

in a manner like this in SQL, using a table-valued function

select sql_logest_tabularfunction(select x,y from mydataset_table) as exp

exp
---
2

many thanks

David

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-08-20 : 13:37:51

Without the actual formula that Excel uses, this is going to be very difficult.

It's possible to use COM objects to instantiate Excel, paste the data into a workbook, then evaluate the formula using the sp_OA system procedures (see Books Online for more details), but I'd recommend against that. A better option would be to create a CLR function or procedure, but you'd need to replicate the formula.

davidhills
Starting Member

14 Posts

Posted - 2009-08-20 : 14:22:37

thanks for the reply

I wanted to avoid using excel.

I have a vast array of different 2 D datasets that I want to compare the decay rates of and the processing overhead of shoveling them out to excel via CLR and dot net could be too expensive.

How does a best fit exponential algorithum usually get applied/generated for a discrete 2d dataset?

Anyone know the formula needed?

thanks

David

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-08-20 : 15:08:05

quote:
Originally posted by davidhills
the processing overhead of shoveling them out to excel via CLR and dot net could be too expensive.

If it were least squares linear regression I'd agree, since that's fairly easy to do in SQL:

http://en.wikipedia.org/wiki/Linear_least_squares

quote:
How does a best fit exponential algorithum usually get applied/generated for a discrete 2d dataset? Anyone know the formula needed?

Unless you can apply a log function to your data and do linear regression on it, then you'll probably have to use one of these:

http://en.wikipedia.org/wiki/Non-linear_least_squares

And that's where you'd need to know the exact formula Excel uses. That may not be easy to do in SQL, and handing it off to a CLR function, even if you have to write it yourself, would probably perform a lot better.

As I thought, Peter probably has a solution here:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=77311

davidhills
Starting Member

14 Posts

Posted - 2009-08-20 : 15:46:09

thanks again for replying

I cobbled this together from some other threads, is it of any use? I know the syntax for the table function is wrong but what about the under lying logic.

I can't see how it's doing a exponential curve fit by just summing some coeffients.

at the end of the day I am going to be after some measure of the half life or such, as I am looking for a simplistic measure of the rate of decay of the curve to compare curves.

CREATE FUNCTION dbo.fnCurveFitting
(
@CT TABLE
)
RETURNS @b DECIMAL (b DECIMAL(38, 10))
AS
/*
Exponential y = a*e^(b*x)
*/
BEGIN
DECLARE @n DECIMAL(38, 10),
@x DECIMAL(38, 10),
@x2 DECIMAL(38, 10),
@y DECIMAL(38, 10),
@xy DECIMAL(38, 10),
@d DECIMAL(38, 10),
@b DECIMAL(38, 10)

SELECT @n = COUNT(*),
@x = SUM(x),
@y = SUM(LOG(y)),
@xy = SUM(x * LOG(y)),
@d = @n * @x2 - @x * @x
FROM CT
SELECT @b = (@n * @xy - @x * @y) / @d
RETURN
END

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-20 : 16:18:06

http://support.microsoft.com/kb/828528

N 56°04'39.26"
E 12°55'05.63"

davidhills
Starting Member

14 Posts

Posted - 2009-08-21 : 04:33:45

Morning

looks like i'll need a custom Aggregate Function if I want to pass in a table and return a scaler value?

or is it possible with either a table-valued or Scaler-valued function somehow?

The LOGEST excel function appears to use a least squares best fit.

thanks

David Hills

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-21 : 04:37:40

So is my code. It's using Least Squares Best Fit, adapted for logrithm, exponentials and powers.
What you need is to return A, b and R2 values.

A and b are the coefficient for the formula, and R2 is a value from 0 to 1 (being the best) how well the datapoints lie on the "least squares best fit" theoretical curve.

N 56°04'39.26"
E 12°55'05.63"

davidhills
Starting Member

14 Posts

Posted - 2009-08-21 : 12:01:37

Hi Peso

thanks for your code, much appreciated.

I am trying to port it to a sql Aggregate Function.

I am a bit out om my comfort zone here.

I am trying to get the exponential decay coefient from datasets like this. (in the end I am going to be after their half life)

I have a very large number of these datasets, which I want to compare there decay rates. exponential decay seems a good model.

test data
the x values (1,2,3,4,5,6,7,8,9,10)
y values (512,256,128,64,32,16,8,4,2,1)

The x value is week one in the year
the y values is the number of new customers in week 1

so by week 9 of the year we only have 2 of the orginal 512 as customers.

You can see why I am in a hurry !!!!!

SELECT dbo.exponent(sample) [Three integers]
FROM (
SELECT 512 as Sample
UNION ALL SELECT 256
UNION ALL SELECT 128
UNION ALL SELECT 64
UNION ALL SELECT 32
UNION ALL SELECT 16
UNION ALL SELECT 8
UNION ALL SELECT 4
UNION ALL SELECT 2
UNION ALL SELECT 1
) Numbers

answer = -1.68712476968374

I'm using follow vb project

Is this a viable aproach?

Imports System
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports System.Runtime.InteropServices
Imports System.Data
Imports System.Data.SqlClient
Imports Microsoft.SqlServer.Server
Imports System.Collections.Generic
Imports System.Text
Imports System.Runtime.Serialization.Formatters.Binary
Imports System.IO

''' <summary>
''' Sample user-defined aggregate that takes the product
''' of multiple values. NULLs are ignored, except that if there
''' are no non-null values, the result is NULL.
''' </summary>
''' <remarks>This is sample code created with a Beta version of
''' Visual Studio 2005. Do not use this in a production application.
'''
''' by Andrew Novick http://www.NovickSoftware.com
''' </remarks>
<Serializable()> _
<StructLayout(LayoutKind.Sequential)> _
<SqlUserDefinedAggregate(Format.Native, IsInvariantToNulls:=True, _
IsInvariantToOrder:=True, IsNullIfEmpty:=True, _
IsInvariantToDuplicates:=False)> _
Public Class Exponent

Private m_Accumulator As SqlDouble
Private m_HasNonNull As Boolean
Private x As SqlDouble
Private x2 As SqlDouble
Private y As SqlDouble
Private xy As SqlDouble
Private d As SqlDouble
Private b As SqlDouble
Dim xtot As SqlDouble
Dim n As SqlDouble
Dim ytotlog As SqlDouble
' Tracks that any non-nulls
' have been passed to the Accumulate or Merge methods.

''' <summary>
''' Called once when the class is being initialized or reused.
''' </summary>
''' <remarks></remarks>
Public Sub Init()
m_Accumulator = 1.0
n = 0.0
x = 0.0
x2 = 0.0
xy = 0.0
m_HasNonNull = False
End Sub

''' <summary>
''' Called once for each row. value could be NULL.
''' </summary>
''' <param name="y">New value to aggregate by
''' multiplication</param>
''' <remarks></remarks>
Public Sub Accumulate(ByVal y As SqlDouble)

If y.IsNull Then Exit Sub

n = n + 1
x = x + n
x2 = x2 + (Math.Log(x) * Math.Log(x))
y = y + Math.Log(y)
xy = xy + (x * Math.Log(y))
d = n * x2 - x * x

b = (n * xy - x * y) / d
m_Accumulator = b

m_HasNonNull = True
End Sub

''' <summary>
''' Joins two Product aggregates.
''' </summary>
''' <param name="y">The Product aggregate to join to this one.</param>
''' <remarks></remarks>
Public Sub Merge(ByVal y As Exponent)

' If value Is Nothing Then Exit Sub
' If Not value.m_HasNonNull Then Exit Sub
' m_Accumulator *= value.m_Accumulator
' m_HasNonNull = True

End Sub

''' <summary>
''' Returns the result.
''' </summary>
''' <returns>Product of all non-null arguments.</returns>
''' <remarks></remarks>
Public Function Terminate() As SqlDouble
If Not m_HasNonNull Then
Return 0
Else
Return m_Accumulator
End If
End Function
End Class

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-21 : 13:19:29

I think you need "ByVal x As SqlDouble" in the Accumulate Sub.

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-21 : 13:37:43

Some MSDN references
http://msdn.microsoft.com/en-us/library/91e6taax(VS.80).aspx

http://dotnetslackers.com/Community/blogs/basharkokash/archive/2008/06/07/how-to-implement-your-own-aggregate-function-in-sqlclr-sql-server-2005.aspx
http://www.code-magazine.com/Article.aspx?quickid=0712132

N 56°04'39.26"
E 12°55'05.63"

davidhills
Starting Member

14 Posts

Posted - 2009-08-21 : 13:48:49

Evening Peter

The x value increments by 1 for each y value.

I thought I could get away with starting with x =0 for the first y value
then incrementing x by 1 with each subsequent y value.

A general solution passing x and y would obviously be better, I am not sure if the sql aggregate CLR framework interface will take a second arguement as below.

Do you think using the sql aggrecate function is an appropriate approach to solving this?

Does your solution use iterative regression, if so what and how do you decide the first values?

Time to start the weekend, thanks for your help

David

Public Sub Accumulate(ByVal y As SqlDouble,Byval x as SqlDouble)

If y.IsNull Then Exit Sub

n = n + 1
x = x + n
x2 = x2 + (Math.Log(x) * Math.Log(x))
y = y + Math.Log(y)
xy = xy + (x * Math.Log(y))
d = n * x2 - x * x

b = (n * xy - x * y) / d
m_Accumulator = b

m_HasNonNull = True
End Sub

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-21 : 14:01:50

You should NEVER anticipate the increment by 1. And you should NEVER anticipate that the records are read in the order you think.
Every Y value is tied to a specific X value. Period.

The critical part to get the correct value is "xy = xy + (x * Math.Log(y))"

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-21 : 17:32:10

Here is an example on TechNet using two parameters
http://technet.microsoft.com/en-us/library/ms131056.aspx

N 56°04'39.26"
E 12°55'05.63"

davidhills
Starting Member

14 Posts

Posted - 2009-08-24 : 07:24:12

May be 2008 only.

Won't deploy from 2005
Error 1 The Accumulate method in user defined aggregate "SampleAggregate.WeightedAvg" must have exactly one parameter. SampleAggregate

Trying sp3 for 2005

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-24 : 08:08:12

What you can do, is to have your aggregate function to accept XML as input (then holdning both x and y values), parse the xml in the function.

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-25 : 10:43:54

I have a SQLCLR aggregate function ready.
Email me if you want to test a copy of it.

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-25 : 10:49:33

For your sample data above posted 08/21/2009 : 12:01:37 ,
here is the result from the SQLCLR function I wrote.

<dws bestfit="exponential" r2="0.999999999999997" type="least square regression">
  <linear A="341.666666666667" b="-43.5212121212121" r2="0.638140940193726" formula="y=A+b*x" />
  <exponential A="1024" b="-0.693147180559945" r2="0.999999999999997" formula="y=A*e^(b*x)" />
  <logarithmic A="423.270685914182" b="-212.501270315094" r2="0.89178978626192" formula="y=A+b*ln(x)" />
  <power A="1386.37307434986" b="-2.72455746591414" r2="0.905661305353485" formula="y=A*x^b" />
</dws>

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-25 : 10:52:27

This sample code illustrated the correctness


SELECT	x,
	y,
	1024 * EXP(-0.693147180559945 * x)
FROM	(
		SELECT	 1 AS x, 512 AS y UNION ALL
		SELECT	 2, 256 UNION ALL
		SELECT	 3, 128 UNION ALL
		SELECT	 4,  64 UNION ALL
		SELECT	 5,  32 UNION ALL
		SELECT	 6,  16 UNION ALL
		SELECT	 7,   8 UNION ALL
		SELECT	 8,   4 UNION ALL
		SELECT	 9,   2 UNION ALL
		SELECT	10,   1
	) AS d

and the output

x	y	(No column name)
1	512	512
2	256	256
3	128	128
4	64	64.0000000000001
5	32	32
6	16	16
7	8	8.00000000000002
8	4	4.00000000000001
9	2	2.00000000000001
10	1	1

N 56°04'39.26"
E 12°55'05.63"

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources