| Author |
Topic |
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-20 : 11:25:37
|
| Good MorningCan anyone help me create a table-value function, that I can pass in, a two dimensional dataset and return a best fit exponential decay coeffiecient.I would like to replicate the LOGEST excel function when used like this in excelLOGEST({1,2,4,8,16,32},{0,1,2,3,4,5}) = 2 in a manner like this in SQL, using a table-valued functionselect sql_logest_tabularfunction(select x,y from mydataset_table) as expexp---2many thanksDavid |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-08-20 : 13:37:51
|
| Without the actual formula that Excel uses, this is going to be very difficult.It's possible to use COM objects to instantiate Excel, paste the data into a workbook, then evaluate the formula using the sp_OA system procedures (see Books Online for more details), but I'd recommend against that. A better option would be to create a CLR function or procedure, but you'd need to replicate the formula. |
 |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-20 : 14:22:37
|
| thanks for the replyI wanted to avoid using excel.I have a vast array of different 2 D datasets that I want to compare the decay rates of and the processing overhead of shoveling them out to excel via CLR and dot net could be too expensive.How does a best fit exponential algorithum usually get applied/generated for a discrete 2d dataset?Anyone know the formula needed?thanksDavid |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-08-20 : 15:08:05
|
quote: Originally posted by davidhillsthe processing overhead of shoveling them out to excel via CLR and dot net could be too expensive.
If it were least squares linear regression I'd agree, since that's fairly easy to do in SQL:http://en.wikipedia.org/wiki/Linear_least_squaresquote: How does a best fit exponential algorithum usually get applied/generated for a discrete 2d dataset? Anyone know the formula needed?
Unless you can apply a log function to your data and do linear regression on it, then you'll probably have to use one of these:http://en.wikipedia.org/wiki/Non-linear_least_squaresAnd that's where you'd need to know the exact formula Excel uses. That may not be easy to do in SQL, and handing it off to a CLR function, even if you have to write it yourself, would probably perform a lot better.As I thought, Peter probably has a solution here:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=77311 |
 |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-20 : 15:46:09
|
| thanks again for replyingI cobbled this together from some other threads, is it of any use? I know the syntax for the table function is wrong but what about the under lying logic.I can't see how it's doing a exponential curve fit by just summing some coeffients.at the end of the day I am going to be after some measure of the half life or such, as I am looking for a simplistic measure of the rate of decay of the curve to compare curves.CREATE FUNCTION dbo.fnCurveFitting( @CT TABLE)RETURNS @b DECIMAL (b DECIMAL(38, 10))AS/* Exponential y = a*e^(b*x)*/BEGIN DECLARE @n DECIMAL(38, 10), @x DECIMAL(38, 10), @x2 DECIMAL(38, 10), @y DECIMAL(38, 10), @xy DECIMAL(38, 10), @d DECIMAL(38, 10), @b DECIMAL(38, 10) SELECT @n = COUNT(*), @x = SUM(x), @y = SUM(LOG(y)), @xy = SUM(x * LOG(y)), @d = @n * @x2 - @x * @x FROM CT SELECT @b = (@n * @xy - @x * @y) / @d RETURNEND |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-20 : 16:18:06
|
http://support.microsoft.com/kb/828528 N 56°04'39.26"E 12°55'05.63" |
 |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-21 : 04:33:45
|
| Morninglooks like i'll need a custom Aggregate Function if I want to pass in a table and return a scaler value?or is it possible with either a table-valued or Scaler-valued function somehow?The LOGEST excel function appears to use a least squares best fit.thanksDavid Hills |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-21 : 04:37:40
|
So is my code. It's using Least Squares Best Fit, adapted for logrithm, exponentials and powers.What you need is to return A, b and R2 values.A and b are the coefficient for the formula, and R2 is a value from 0 to 1 (being the best) how well the datapoints lie on the "least squares best fit" theoretical curve. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-21 : 12:01:37
|
| Hi Pesothanks for your code, much appreciated.I am trying to port it to a sql Aggregate Function.I am a bit out om my comfort zone here.I am trying to get the exponential decay coefient from datasets like this. (in the end I am going to be after their half life)I have a very large number of these datasets, which I want to compare there decay rates. exponential decay seems a good model.test datathe x values (1,2,3,4,5,6,7,8,9,10) y values (512,256,128,64,32,16,8,4,2,1)The x value is week one in the yearthe y values is the number of new customers in week 1so by week 9 of the year we only have 2 of the orginal 512 as customers.You can see why I am in a hurry !!!!!SELECT dbo.exponent(sample) [Three integers] FROM ( SELECT 512 as Sample UNION ALL SELECT 256 UNION ALL SELECT 128 UNION ALL SELECT 64 UNION ALL SELECT 32 UNION ALL SELECT 16 UNION ALL SELECT 8 UNION ALL SELECT 4 UNION ALL SELECT 2 UNION ALL SELECT 1 ) Numbersanswer = -1.68712476968374I'm using follow vb projectIs this a viable aproach?Imports SystemImports System.Data.SqlImports System.Data.SqlTypesImports System.Runtime.InteropServicesImports System.DataImports System.Data.SqlClientImports Microsoft.SqlServer.ServerImports System.Collections.GenericImports System.TextImports System.Runtime.Serialization.Formatters.BinaryImports System.IO''' <summary>''' Sample user-defined aggregate that takes the product ''' of multiple values. NULLs are ignored, except that if there''' are no non-null values, the result is NULL.''' </summary>''' <remarks>This is sample code created with a Beta version of ''' Visual Studio 2005. Do not use this in a production application.''' ''' by Andrew Novick http://www.NovickSoftware.com''' </remarks><Serializable()> _<StructLayout(LayoutKind.Sequential)> _<SqlUserDefinedAggregate(Format.Native, IsInvariantToNulls:=True, _ IsInvariantToOrder:=True, IsNullIfEmpty:=True, _ IsInvariantToDuplicates:=False)> _Public Class Exponent Private m_Accumulator As SqlDouble Private m_HasNonNull As Boolean Private x As SqlDouble Private x2 As SqlDouble Private y As SqlDouble Private xy As SqlDouble Private d As SqlDouble Private b As SqlDouble Dim xtot As SqlDouble Dim n As SqlDouble Dim ytotlog As SqlDouble ' Tracks that any non-nulls ' have been passed to the Accumulate or Merge methods. ''' <summary> ''' Called once when the class is being initialized or reused. ''' </summary> ''' <remarks></remarks> Public Sub Init() m_Accumulator = 1.0 n = 0.0 x = 0.0 x2 = 0.0 xy = 0.0 m_HasNonNull = False End Sub ''' <summary> ''' Called once for each row. value could be NULL. ''' </summary> ''' <param name="y">New value to aggregate by ''' multiplication</param> ''' <remarks></remarks> Public Sub Accumulate(ByVal y As SqlDouble) If y.IsNull Then Exit Sub n = n + 1 x = x + n x2 = x2 + (Math.Log(x) * Math.Log(x)) y = y + Math.Log(y) xy = xy + (x * Math.Log(y)) d = n * x2 - x * x b = (n * xy - x * y) / d m_Accumulator = b m_HasNonNull = True End Sub ''' <summary> ''' Joins two Product aggregates. ''' </summary> ''' <param name="y">The Product aggregate to join to this one.</param> ''' <remarks></remarks> Public Sub Merge(ByVal y As Exponent) ' If value Is Nothing Then Exit Sub ' If Not value.m_HasNonNull Then Exit Sub ' m_Accumulator *= value.m_Accumulator ' m_HasNonNull = True End Sub ''' <summary> ''' Returns the result. ''' </summary> ''' <returns>Product of all non-null arguments.</returns> ''' <remarks></remarks> Public Function Terminate() As SqlDouble If Not m_HasNonNull Then Return 0 Else Return m_Accumulator End If End FunctionEnd Class |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-21 : 13:19:29
|
I think you need "ByVal x As SqlDouble" in the Accumulate Sub. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-21 : 13:48:49
|
| Evening PeterThe x value increments by 1 for each y value.I thought I could get away with starting with x =0 for the first y valuethen incrementing x by 1 with each subsequent y value.A general solution passing x and y would obviously be better, I am not sure if the sql aggregate CLR framework interface will take a second arguement as below.Do you think using the sql aggrecate function is an appropriate approach to solving this?Does your solution use iterative regression, if so what and how do you decide the first values?Time to start the weekend, thanks for your helpDavid Public Sub Accumulate(ByVal y As SqlDouble,Byval x as SqlDouble)If y.IsNull Then Exit Subn = n + 1x = x + nx2 = x2 + (Math.Log(x) * Math.Log(x))y = y + Math.Log(y)xy = xy + (x * Math.Log(y))d = n * x2 - x * xb = (n * xy - x * y) / dm_Accumulator = bm_HasNonNull = TrueEnd Sub |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-21 : 14:01:50
|
You should NEVER anticipate the increment by 1. And you should NEVER anticipate that the records are read in the order you think.Every Y value is tied to a specific X value. Period.The critical part to get the correct value is "xy = xy + (x * Math.Log(y))" N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
davidhills
Starting Member
14 Posts |
Posted - 2009-08-24 : 07:24:12
|
| May be 2008 only.Won't deploy from 2005Error 1 The Accumulate method in user defined aggregate "SampleAggregate.WeightedAvg" must have exactly one parameter. SampleAggregateTrying sp3 for 2005 |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-24 : 08:08:12
|
What you can do, is to have your aggregate function to accept XML as input (then holdning both x and y values), parse the xml in the function. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-25 : 10:43:54
|
I have a SQLCLR aggregate function ready.Email me if you want to test a copy of it. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-25 : 10:49:33
|
For your sample data above posted 08/21/2009 : 12:01:37 , here is the result from the SQLCLR function I wrote.<dws bestfit="exponential" r2="0.999999999999997" type="least square regression"> <linear A="341.666666666667" b="-43.5212121212121" r2="0.638140940193726" formula="y=A+b*x" /> <exponential A="1024" b="-0.693147180559945" r2="0.999999999999997" formula="y=A*e^(b*x)" /> <logarithmic A="423.270685914182" b="-212.501270315094" r2="0.89178978626192" formula="y=A+b*ln(x)" /> <power A="1386.37307434986" b="-2.72455746591414" r2="0.905661305353485" formula="y=A*x^b" /></dws> N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-08-25 : 10:52:27
|
This sample code illustrated the correctnessSELECT x, y, 1024 * EXP(-0.693147180559945 * x)FROM ( SELECT 1 AS x, 512 AS y UNION ALL SELECT 2, 256 UNION ALL SELECT 3, 128 UNION ALL SELECT 4, 64 UNION ALL SELECT 5, 32 UNION ALL SELECT 6, 16 UNION ALL SELECT 7, 8 UNION ALL SELECT 8, 4 UNION ALL SELECT 9, 2 UNION ALL SELECT 10, 1 ) AS d and the outputx y (No column name)1 512 5122 256 2563 128 1284 64 64.00000000000015 32 326 16 167 8 8.000000000000028 4 4.000000000000019 2 2.0000000000000110 1 1 N 56°04'39.26"E 12°55'05.63" |
 |
|
|
|