Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
akiles
Starting Member
8 Posts |
Posted - 2011-09-19 : 11:31:41
|
Hi,I needed a way to calculate the median of a set of data. I found some pre-cooked udfs online, but I wanted to create my own.I started by writing an sql statement that would do just that and then worked my way down to replacing parts of the statements with variables until the sql script would look like a function.So, the goal is this: obtain the median of a set of numbers but make it flexible enough so that one could specify a column from any view or table and obtain the median.I ended up with this:--note: replace variables @mySource and @calcField witht a name of the table and column existing in your db and it *should* work.use RealEstatedeclare@mySource nvarchar(50),@calcField nvarchar(50),@sqlDef nvarchar(250),@sqlDefMed nvarchar(250),@obs smallint,--numeric(14,4),@MedLoc1 smallint,--numeric(14,4),@MedLoc2 smallint,--numeric(14,4),@Median numeric(14,4);SELECT @mySource='dbo.land';SELECT @calcField='Area';SELECT @sqlDef = 'SELECT @obs = count(' + @calcField + ') from ' + @mySource + ' where ' + @calcField + ' IS NOT NULL';EXECUTE SP_EXECUTESQL @query = @sqlDef, @params = N'@obs smallint output', @obs = @obs output; --SELECT @@ROWCOUNT as rowc --an alternative to countIF @Obs%2=0 BEGIN SELECT @MedLoc1 = @Obs/2; SELECT @MedLoc2 = @MedLoc1+1; SELECT @sqlDefMed = 'SELECT @Median = Sum(' + @calcField + ')/2 FROM (SELECT ' + @calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' + @MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' + rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR)); --SELECT @sqlDefMed; EXECUTE SP_EXECUTESQL @query = @sqlDefMed, @Params = N'@Median numeric (14,4) output', @Median = @Median output;END ELSE BEGIN SELECT @MedLoc1 = floor(@Obs/2)+1 SELECT @MedLoc2 = floor(@Obs/2)+1 SELECT @sqlDefMed = 'SELECT @Median = ' + @calcField + ' FROM (SELECT ' + @calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' + @MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' + rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR)); --SELECT @sqlDefMed; EXECUTE SP_EXECUTESQL @query = @sqlDefMed, @Params = N'@Median numeric (14,4) output', @Median = @Median output;ENDSELECT @obs as obs, @MedLoc1 as medloc1, @MedLoc2 as medloc2, @Median as medianSince it worked, I turned it into a function:USE [RealEstate]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOCREATE FUNCTION [dbo].[fn_getMedian] ( @calcField nvarchar(50), @mySource nvarchar(50)) RETURNS numeric(14,4)with execute as callerASBEGIN -- Declare the return variable here DECLARE @Median numeric(14,4), @sqlDef nvarchar(250), @sqlDefMed nvarchar(250), @obs smallint, @MedLoc1 smallint, @MedLoc2 smallint SELECT @mySource='dbo.land'; SELECT @calcField='Area'; SELECT @sqlDef = 'SELECT @obs = count(' + @calcField + ') from ' + @mySource + ' where ' + @calcField + ' IS NOT NULL'; EXECUTE SP_EXECUTESQL @query = @sqlDef, @params = N'@obs smallint output', @obs = @obs output; --SELECT @@ROWCOUNT as rowc --an alternative to count IF @Obs%2=0 BEGIN SELECT @MedLoc1 = @Obs/2; SELECT @MedLoc2 = @MedLoc1+1; SELECT @sqlDefMed = 'SELECT @Median = Sum(' + @calcField + ')/2 FROM (SELECT ' + @calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' + @MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' + rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR)); --SELECT @sqlDefMed; EXECUTE SP_EXECUTESQL @query = @sqlDefMed, @Params = N'@Median numeric (14,4) output', @Median = @Median output; END ELSE BEGIN SELECT @MedLoc1 = floor(@Obs/2)+1 SELECT @MedLoc2 = floor(@Obs/2)+1 SELECT @sqlDefMed = 'SELECT @Median = ' + @calcField + ' FROM (SELECT ' + @calcField + ', ROW_NUMBER() OVER(ORDER BY ' + @calcField + ' asc) as _Rank FROM ' + @MySource + ' WHERE ' + @calcField + ' is not null) as t WHERE t._Rank between ' + rtrim(cast(@MedLoc1 as NCHAR)) + ' AND '+ rtrim(cast(@MedLoc2 as NCHAR)); --SELECT @sqlDefMed; EXECUTE SP_EXECUTESQL @query = @sqlDefMed, @Params = N'@Median numeric (14,4) output', @Median = @Median output; END -- Return the result of the function RETURN(isnull(@Median,0))ENDGOBut when I run it like:use realestateselect dbo.fn_getMedian('area','dbo.land')I get:sg 557, Level 16, State 2, Line 2Only functions and some extended stored procedures can be executed from within a function.I've been thru each line several times and I just can't figure out why is not working!!!!Part of it is that I am less than a noob so my sql-fu is very weak.Could someone point me in the right direction, throw me a bone?I appreciate your time, and I apologize for my noob-ness. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-19 : 12:03:09
|
| You cannot use dynamic SQL in a function. Either put that as a stored procedure, or revisit the design. In general procedures or functions that take table or column names are signs of a poor database design or poor application design. This actually looks like it should be a CLR user-defined aggregate.--Gail ShawSQL Server MVP |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-09-19 : 14:44:34
|
See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=165297.You can substitute the MONTH_NAME and NUMBER columns with a CASE statement of your choice.After all, not all columns in a table can be used for median calculations; only numeric ones. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
akiles
Starting Member
8 Posts |
Posted - 2011-09-19 : 15:57:12
|
Thanks for your response, GilaMonster.That you cannot use dynamic sql in a function explains why the function fails -thank you!I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach? The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables. The next thing to tackle is to do the same thing within groups within tables(as an aggregate function). I will look into your clr suggestion.Thanks again, GilaMonster.quote: Originally posted by GilaMonster You cannot use dynamic SQL in a function. Either put that as a stored procedure, or revisit the design. In general procedures or functions that take table or column names are signs of a poor database design or poor application design. This actually looks like it should be a CLR user-defined aggregate.--Gail ShawSQL Server MVP
|
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-19 : 16:39:43
|
quote: Originally posted by akiles I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach?
Apples, watermelons and spare tires. It generally means that data model is badly designed and you (at development time) don't know what is going to be done with data. It also sometime just indicates laziness.Would you write a function in C# that blindly did X to any object whatsoever that was passed in?quote: The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables.
So you have several tables storing the same kind of data? Or do you just not want to write the specific queries for each table?--Gail ShawSQL Server MVP |
 |
|
|
akiles
Starting Member
8 Posts |
Posted - 2011-09-19 : 16:59:27
|
I don't know C# -just a little vb and some S :(But I would not blindly do anything to x ->I am being a smart a$$ now :)My experience with sql is limited to extracting data (simple queries). Usually I get what I need and manipulate it in R, excel or access.The database I'm trying to work with it was purchased from a company. It's for valuing real estate. The value of the real estate is stored who-knows-where. When the real estate sells, that information is stored in who-knows-where. I'm not kidding, I've asked the developers and I'm yet to get an straight answer. I'm working from views given to me by the developers.So if the real estate sells for 100, and we think it should sell for 120, we are off by 20%. I don't give a rat's behind about the median, but the "higher powers" do, so I have to get the median for the actual price, for our model valuation, and then for the value we are off by.We can still do all this from outside. I'd just thought I'd give it a try and attempt to create a function for the median. My thought was this: if you can do select avg(x) from table why not do the same for the median?What about this: modify the existing function so that the input variable is a set of data?so that the function fn_median(of x) is called as follows:fn_median("select avg(x) from table") But I'm not sure about this approach because it seems to me that the select statement would still be a dynamic sql?Sorry, mate -my post became a novel. I really appreciate your time.quote: Originally posted by GilaMonster
quote: Originally posted by akiles I'm not trying to be a smart-a$$ (I'm just VERY green), but why is this a bad approach?
Apples, watermelons and spare tires. It generally means that data model is badly designed and you (at development time) don't know what is going to be done with data. It also sometime just indicates laziness.Would you write a function in C# that blindly did X to any object whatsoever that was passed in?quote: The original reason for wanting to use dynamic sql is because I have to perform the same analysis, several times a week, on several different tables.
So you have several tables storing the same kind of data? Or do you just not want to write the specific queries for each table?--Gail ShawSQL Server MVP |
 |
|
|
akiles
Starting Member
8 Posts |
Posted - 2011-09-19 : 17:04:44
|
Your code is so elegant... so compact, I don't get it :)For what I can see, you are using a temp table. I was trying to avoid that scenario as this is not my database.quote: Originally posted by SwePeso See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=165297.You can substitute the MONTH_NAME and NUMBER columns with a CASE statement of your choice.After all, not all columns in a table can be used for median calculations; only numeric ones. N 56°04'39.26"E 12°55'05.63"
|
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-09-20 : 02:54:50
|
The temp table is only to demonstrate a source table. You can substitute it with any other table. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
|
|
|
|
|