Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 SQL commands fail when used as UDF

Author  Topic 

akiles
Starting Member

8 Posts

Posted - 2011-09-19 : 11:31:41
Hi,
I needed a way to calculate the median of a set of data. I found some pre-cooked udfs online, but I wanted to create my own.

I started by writing an sql statement that would do just that and then worked my way down to replacing parts of the statements with variables until the sql script would look like a function.

So, the goal is this: obtain the median of a set of numbers but make it flexible enough so that one could specify a column from any view or table and obtain the median.

I ended up with this:
--note: replace variables @mySource and @calcField witht a name of the table and column existing in your db and it *should* work.



use RealEstate
declare
@mySource nvarchar(50),
@calcField nvarchar(50),
@sqlDef nvarchar(250),
@sqlDefMed nvarchar(250),
@obs smallint,--numeric(14,4),
@MedLoc1 smallint,--numeric(14,4),
@MedLoc2 smallint,--numeric(14,4),
@Median numeric(14,4);

SELECT @mySource='dbo.land';
SELECT @calcField='Area';
SELECT @sqlDef = 'SELECT @obs = count(' + @calcField + ') from '
+ @mySource + ' where ' + @calcField + ' IS NOT NULL';
EXECUTE SP_EXECUTESQL
@query = @sqlDef,
@params = N'@obs smallint output',
@obs = @obs output;
--SELECT @@ROWCOUNT as rowc --an alternative to count
IF @Obs%2=0 BEGIN
SELECT @MedLoc1 = @Obs/2;
SELECT @MedLoc2 = @MedLoc1+1;
SELECT @sqlDefMed = 'SELECT @Median = Sum(' + @calcField + ')/2 FROM (SELECT ' +
@calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' +
@MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' +
rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR));
--SELECT @sqlDefMed;
EXECUTE SP_EXECUTESQL
@query = @sqlDefMed,
@Params = N'@Median numeric (14,4) output',
@Median = @Median output;
END ELSE BEGIN
SELECT @MedLoc1 = floor(@Obs/2)+1
SELECT @MedLoc2 = floor(@Obs/2)+1
SELECT @sqlDefMed = 'SELECT @Median = ' + @calcField + ' FROM (SELECT ' +
@calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' +
@MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' +
rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR));
--SELECT @sqlDefMed;
EXECUTE SP_EXECUTESQL
@query = @sqlDefMed,
@Params = N'@Median numeric (14,4) output',
@Median = @Median output;
END
SELECT @obs as obs, @MedLoc1 as medloc1, @MedLoc2 as medloc2, @Median as median


Since it worked, I turned it into a function:

USE [RealEstate]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fn_getMedian] (
@calcField nvarchar(50),
@mySource nvarchar(50))

RETURNS numeric(14,4)
with execute as caller
AS
BEGIN
-- Declare the return variable here
DECLARE
@Median numeric(14,4),
@sqlDef nvarchar(250),
@sqlDefMed nvarchar(250),
@obs smallint,
@MedLoc1 smallint,
@MedLoc2 smallint

SELECT @mySource='dbo.land';
SELECT @calcField='Area';
SELECT @sqlDef = 'SELECT @obs = count(' + @calcField + ') from '
+ @mySource + ' where ' + @calcField + ' IS NOT NULL';
EXECUTE SP_EXECUTESQL
@query = @sqlDef,
@params = N'@obs smallint output',
@obs = @obs output;
--SELECT @@ROWCOUNT as rowc --an alternative to count
IF @Obs%2=0 BEGIN
SELECT @MedLoc1 = @Obs/2;
SELECT @MedLoc2 = @MedLoc1+1;
SELECT @sqlDefMed = 'SELECT @Median = Sum(' + @calcField + ')/2 FROM (SELECT ' +
@calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' +
@MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' +
rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR));
--SELECT @sqlDefMed;
EXECUTE SP_EXECUTESQL
@query = @sqlDefMed,
@Params = N'@Median numeric (14,4) output',
@Median = @Median output;
END ELSE BEGIN
SELECT @MedLoc1 = floor(@Obs/2)+1
SELECT @MedLoc2 = floor(@Obs/2)+1
SELECT @sqlDefMed = 'SELECT @Median = ' + @calcField + ' FROM (SELECT ' +
@calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' +
@MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' +
rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR));
--SELECT @sqlDefMed;
EXECUTE SP_EXECUTESQL
@query = @sqlDefMed,
@Params = N'@Median numeric (14,4) output',
@Median = @Median output;
END
-- Return the result of the function
RETURN(isnull(@Median,0))
END

GO



But when I run it like:

use realestate
select dbo.fn_getMedian('area','dbo.land')


I get:
sg 557, Level 16, State 2, Line 2
Only functions and some extended stored procedures can be executed from within a function.

I've been thru each line several times and I just can't figure out why is not working!!!!
Part of it is that I am less than a noob so my sql-fu is very weak.

Could someone point me in the right direction, throw me a bone?

I appreciate your time, and I apologize for my noob-ness.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-19 : 12:03:09
You cannot use dynamic SQL in a function.

Either put that as a stored procedure, or revisit the design. In general procedures or functions that take table or column names are signs of a poor database design or poor application design.

This actually looks like it should be a CLR user-defined aggregate.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-19 : 14:44:34
See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=165297.
You can substitute the MONTH_NAME and NUMBER columns with a CASE statement of your choice.
After all, not all columns in a table can be used for median calculations; only numeric ones.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

akiles
Starting Member

8 Posts

Posted - 2011-09-19 : 15:57:12
Thanks for your response, GilaMonster.
That you cannot use dynamic sql in a function explains why the function fails -thank you!

I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach? The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables. The next thing to tackle is to do the same thing within groups within tables(as an aggregate function). I will look into your clr suggestion.

Thanks again, GilaMonster.


quote:
Originally posted by GilaMonster

You cannot use dynamic SQL in a function.

Either put that as a stored procedure, or revisit the design. In general procedures or functions that take table or column names are signs of a poor database design or poor application design.

This actually looks like it should be a CLR user-defined aggregate.

--
Gail Shaw
SQL Server MVP

Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-19 : 16:39:43
quote:
Originally posted by akiles

I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach?


Apples, watermelons and spare tires.

It generally means that data model is badly designed and you (at development time) don't know what is going to be done with data. It also sometime just indicates laziness.

Would you write a function in C# that blindly did X to any object whatsoever that was passed in?

quote:
The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables.


So you have several tables storing the same kind of data? Or do you just not want to write the specific queries for each table?

--
Gail Shaw
SQL Server MVP
Go to Top of Page

akiles
Starting Member

8 Posts

Posted - 2011-09-19 : 16:59:27
I don't know C# -just a little vb and some S :(
But I would not blindly do anything to x ->I am being a smart a$$ now :)
My experience with sql is limited to extracting data (simple queries). Usually I get what I need and manipulate it in R, excel or access.

The database I'm trying to work with it was purchased from a company. It's for valuing real estate. The value of the real estate is stored who-knows-where. When the real estate sells, that information is stored in who-knows-where. I'm not kidding, I've asked the developers and I'm yet to get an straight answer. I'm working from views given to me by the developers.

So if the real estate sells for 100, and we think it should sell for 120, we are off by 20%. I don't give a rat's behind about the median, but the "higher powers" do, so I have to get the median for the actual price, for our model valuation, and then for the value we are off by.

We can still do all this from outside. I'd just thought I'd give it a try and attempt to create a function for the median. My thought was this: if you can do
select avg(x) from table
why not do the same for the median?

What about this: modify the existing function so that the input variable is a set of data?
so that the function fn_median(of x) is called as follows:
fn_median("select avg(x) from table")
But I'm not sure about this approach because it seems to me that the select statement would still be a dynamic sql?

Sorry, mate -my post became a novel. I really appreciate your time.


quote:
Originally posted by GilaMonster

quote:
Originally posted by akiles

I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach?


Apples, watermelons and spare tires.

It generally means that data model is badly designed and you (at development time) don't know what is going to be done with data. It also sometime just indicates laziness.

Would you write a function in C# that blindly did X to any object whatsoever that was passed in?

quote:
The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables.


So you have several tables storing the same kind of data? Or do you just not want to write the specific queries for each table?

--
Gail Shaw
SQL Server MVP

Go to Top of Page

akiles
Starting Member

8 Posts

Posted - 2011-09-19 : 17:04:44
Your code is so elegant... so compact, I don't get it :)
For what I can see, you are using a temp table. I was trying to avoid that scenario as this is not my database.

quote:
Originally posted by SwePeso

See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=165297.
You can substitute the MONTH_NAME and NUMBER columns with a CASE statement of your choice.
After all, not all columns in a table can be used for median calculations; only numeric ones.



N 56°04'39.26"
E 12°55'05.63"


Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-09-20 : 02:54:50
The temp table is only to demonstrate a source table. You can substitute it with any other table.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -