Performance and UDF's - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Performance and UDF's

Author

Topic

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-02-17 : 20:17:01

I'm starting to discover that user defined functions are a big performance problem. Worse, it seems that they are not factored into the execution plan as seen in query analyzer.

Here's an example:


-- Create basic log table
CREATE TABLE log_running (i int identity,datetime datetime)

-- Insert a bunch of data
DECLARE @iX int
While @iX<250000
  insert into log_running (datetime) VALUES(getdate())

-- Index the table
CREATE INDEX [idx_log_run_datesite] ON [dbo].[log_running]([datetime] DESC) ON [PRIMARY]

-- Here's a function that returns the current date and strips the time
CREATE FUNCTION dbo.f_frnk_util_DateOnly  (@dDate as datetime)  
RETURNS datetime  AS  
BEGIN 
DECLARE @d datetime
select @d=convert(datetime,convert(varchar(10),@dDate,101))
return @d
END

-- Ok, let's check how many entries in the log are from today

-- QUERY 1
select count(*) from log_running where datetime>=dbo.f_frnk_util_DateOnly(getdate())

-- QUERY 2 (adjust for today's real date)
select count(*) from log_running where datetime>='2/17/2002 12:00:00AM'

Query Analyzer says the cost of query 1 is 0.392 on my system. It says query 2 costs .471. So Query 1 is better, right? Nope. Query 1 takes 11 seconds to run, query 2 takes 2 seconds.

What I've discovered is that the query cost does not include the use of the user defined function. At first, I thought my mistake was in passing getdate() to the function. Replacing getdate() with a datetime variable that's set to getdate() before the select does not improve performance.

What's going on? Am I really off better tracking down every use of a function and replacing it with inline code that does the same thing? If so, what's the point of UDF's?

Beyond that, how can I tell if I have bad logic in a function if their performance isn't factored into the query plan?

Mystified,
-b

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-17 : 20:32:51

aiken,

As you have just found out, the graphical execution plan is not the definitive answer.. Have you tried using these?

SET STATISTICS TIME ON
SET STATISTICS IO ON
or
SET SHOWPLAN_ALL ON

DavidM

Tomorrow is the same day as Today was the day before.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-02-17 : 20:46:26

Why don't you just use this:

SELECT count(*) FROM log_running WHERE DateDiff(dd, [datetime], getdate())=0

By the way, did you name a column after a reserved word (datetime)? That's a sure-fire way to have code that doesn't work properly. NEVER use a SQL reserved word for object or column names.

There is an entry in Books Online called "user-defined function recommendations", also called "best practices".

As far as including the function's plan in query analyzer, I can't see that it would help you even if it was. Your function does not access a table (it returns a scalar value), so the optimizer has no other table access to factor into the plan. The overhead of the function running on each row is where the extra time comes into play. That's why the 2nd version is faster; the value is fixed.

You can keep the function and do this:

DECLARE @dateVal datetime
SELECT @dateVal=dbo.f_frnk_util_DateOnly(getdate())
SELECT count(*) FROM log_running WHERE datetime>=@dateVal

It's equivalent to your 2nd query.

I don't think it's accurate to suggest UDFs perform badly; you just need to use them properly. Read all the entries in Books Online regarding UDFs and try some experimentation, you'll find where they work best and where to avoid them. You should also read up on and understand deterministic and non-deterministic functions.

Edited by - robvolk on 02/17/2002 20:54:52

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-02-17 : 20:56:10

Thanks for the great (and speedy) responses.

I have definitely learned my lesson regarding the graphical showplan.

As for the column name, I can plead not my fault! I inherited this system and changing the column name would be prohibitive as far as appdev and code changes. Believe me, if I have to type [datetime] one more time, I am going to find the previous DBA and strangle him.

I have ended up using the declare variable, set variable to function value, use variable in query approach. The reason I didn't use the datediff part is because the table can actually span years, and I didn't want to get into dy and year dateparts all over the place; I thought the simple function would help out. Turns out I was wrong :)

(Edited to add: I will read BOL regarding functions, so forgive me if the following is obvious/redundant).

What I don't understand is why the query optimizer can't figure out that the function is deterministic; it does not reference any tables, has no if/thens; it's basically a wrapper for a couple of converts based on the input value. Shouldn't be that hard to figure out that any given input value will always have the same output value, and collapse that into the query rather than re-running the function for every single row.

Anyways, thanks again. I'm moving along again here, but I'm definitely going to reexamine my use of UDF's to see if there are other places that I'm forcing the server to do massive computations where I was hoping the UDF would be evaluated just once.

Is there ever a case where the UDF *is* evaluated just once? For instance,


CREATE FUNCTION dbo.f_ONE(@i int) AS
RETURNS INT
RETURN 1

...if I use that in a select query, is it going to be evalated for each row?

Thanks again -- I love sqlteam.com!

-b

Edited by - aiken on 02/17/2002 20:57:32

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-17 : 21:00:48

Good call Rob!

As for performance of UDF's.....

I have been experimenting with using them instead of Foreign Keys for simulating Domains..

Lets take PostCodes as our domain...


--PostCodes Table (Domain)
Create table PostCodes(PostCode smallint not null Primary Key)
go
--Populate PostCode
DECLARE @Loop int
SET @LOOP = 1000
WHILE @LOOP < 1100
BEGIN
	INSERT PostCodes (PostCode) VALUES (@Loop)
SET @LOOP = @LOOP + 1
END
go
--PostCode Existence Check
CREATE FUNCTION ufCheckPostCode
(@PostCode int)
RETURNS BIT
AS
BEGIN
DECLARE @Bit BIT
IF EXISTS(SELECT 1 FROM PostCodes WHERE PostCode = @PostCode)
	SET @Bit = 1
ELSE
	SET @Bit = 0
RETURN @Bit
END
go
--Address Table using UDF as Domain 
CREATE TABLE AddressUDF(AddressID int not null Identity(1,1) Primary Key, PostCode SMALLINT NOT NULL CHECK(dbo.ufCheckPostCode(PostCode)= 1))
--Address Table using FK as Domain 
CREATE TABLE AddressFK(AddressID int not null Identity(1,1) Primary Key, PostCode SMALLINT NOT NULL REFERENCES PostCodes(PostCode))
go

As you can see there I have created 2 tables one using UDF's the other FK..

Now run the following


SET STATISTICS TIME ON
SET STATISTICS IO OFF
SET STATISTICS PROFILE OFF
print 'UDF Insert'
INSERT AddressUDF(PostCode) VALUES (1001)
print 'FK Insert'
INSERT AddressFK(PostCode) VALUES (1001)
print 'UDF Update'
UPDATE AddressUDF SET PostCode = 1002 WHERE PostCode = 1001
print 'FK Update'
UPDATE AddressFK SET PostCode = 1002 WHERE PostCode = 1001
print 'UDF Delete'
DELETE AddressUDF WHERE PostCode = 1002
print 'FK Delete'
DELETE AddressFK WHERE PostCode = 1002

I was surprised at the result....UDF faster than FK...

Any comments? Any one?

DavidM

Tomorrow is the same day as Today was the day before.

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2002-02-17 : 21:01:03

quote:

Believe me, if I have to type [datetime] one more time, I am going to find the previous DBA and strangle him.

I just thought of a great new feature for SQLTeam.
A "crackhead DBA Register". Where SQLTeam memebers can register the names of DBAs and "consultants" who implemented crackhead data models and cursor based solutions. Anyone that sees their name on the list will have the opportunity to try to defend their design, and we can roast them

Then, anyone here in the position of hiring DBAs and Developers can check names against the register so they don't have the same mistakes made.

I know of one guy who's first name starts with a D and who's second name starts with a T who wouldn't pick up much work if the word spread.

What do you all think ? Graz, are you listening ?

Damian

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-17 : 21:14:44

Sounds harsh Damian... But definately fair...

DavidM

Tomorrow is the same day as Today was the day before.

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2002-02-17 : 21:18:03

Just think how many posts we have seen here that start with

"I have inherited a system that uses 4 nested cursors creating a CSV file to be inserted into an Access database that gets DTSed into an Excel report" etc etc

I can see an end being put to all of that!

Actually, I am completely delusional, but it's a nice thought.

Damian

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-02-17 : 21:25:05

Well, I've seen a few "worst practice" articles going around recently, but I tell you that I, for one, would benefit from some kind of simple "do not do these things" FAQ with single line nasty practices and a link to a good article explaining why.

I can definitely blame some of my DB's problems on a previous administrator, but I personally created almost 200 stored procedures named sp_* before I learned better. Among other embarrassing problems (foreign keys being a different data type, clustered indexes on identity columns in heavily-inserted tables, using cursors, I could go on and on... I just wonder what stupid mistakes I'm *still making*).

Cheers
-b

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-02-17 : 21:29:03

quote:
I was surprised at the result....UDF faster than FK...

Any comments? Any one?

It's a shot in the dark, but:

1. Using EXISTS is historically faster than other operators;
2. FOREIGN KEY requires unique constraint or index, but may not use it in the lookup process, while your function probably would

Have you tried this using an INSERT...SELECT...FROM operation, and multi-row UPDATE/DELETE operations? It might only be faster on single-row ops and crap out on large multi-row ones.

quote:
A "crackhead DBA Register". Where SQLTeam members can register the names of DBAs and "consultants" who implemented crackhead data models and cursor based solutions. Anyone that sees their name on the list will have the opportunity to try to defend their design, and we can roast them

While I would love to see something like that, I doubt we'll get a lot of new members if they see their name in it...

...and as you suggested

we've had a crackhead defend (strenuously!) their bad design/cursors, while never admitting they were bad, even when (exhaustively) proven.

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-17 : 21:31:50

Thanks for the feedback Rob...

I will look at the batch insert and delete..

DavidM

Tomorrow is the same day as Today was the day before.

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-17 : 22:23:43

Some quick feedback Rob....

Bulk Inserting - UDF faster
Bulk Update - UDF faster
Bulk Delete - Same (As you would expect - No FK or UDF Check)

DavidM

Tomorrow is the same day as Today was the day before.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-02-18 : 05:14:56

Woah! Hold up! Wait a second!
Aiken, are you saying that you are seeing different performance between this:

select count(*) from log_running where datetime>=dbo.f_frnk_util_DateOnly(getdate())

and the code Rob posted:

DECLARE @dateVal datetime
SELECT @dateVal=dbo.f_frnk_util_DateOnly(getdate())
SELECT count(*) FROM log_running WHERE datetime>=@dateVal

Given the table set-up you posted, I get the same run times (~100ms) for both these and the one with the literal date. Obviously, the queries using the UDF get silly cost estimates, but that's the optimizer not being able to predict the return value of the UDF and relying on its heuristics for inequality matches -- it's guessing that only 30% of rows will match the predicate. For the fixed value, it can tell from the statistics that (virtually) all the rows will match. SQL Server 2000 sp2.

Edit: Now here's a strange thing. Drop the index and the query plans have almost identical cost estimations (fair enough: it will have to scan the whole table). But now the call to the UDF gets pushed out into a filter and executed for every row. Result: the UDF-calling query now takes over a minute. And yes, it's the UDF call because inlining the code of the UDF makes it run sensibly.

Edit2: Stranger still, without the index, SET STATISTICS TIME ON reports that while the other queries are taking 135ms, the UDF one takes 0ms!

Edited by - Arnold Fribble on 02/18/2002 05:44:01

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-02-18 : 12:28:53

Yep, I am seeing different run times for the two approaches; assigning the variable is quite a bit quicker for me. However, I am SQL2K, SP1, so maybe SP2 fixes something in this area?

Cheers
-b

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-02-18 : 13:17:27

There's nothing to fix, that's a normal side effect in pretty much any programming language. Pulling a function through a loop is always slower than pulling a variable, or a constant value, because of the overhead of processing the function. The ideal is to limit the areas where a function is called unnecessarily, especially in a loop. This also applies to objects and their properties. You should assign a property value to a variable and then process it in a loop, instead of calling the object repeatedely.

This site is pretty good in ways of optimizing your code. While it's VB a lot of the ideas apply to other languages, including SQL:

www.vb2themax.com

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-02-18 : 16:51:08

I don't think it's as cut-and-dried as that. SQL SELECTs are, after all, pure functional for the most part. There appears to be a limitation on when an independent UDF call will be pulled out of the loop. In SP2 it seems to be the case that a UDF call can be put into the SEEK argument, but not the WHERE argument of a Table/Index Scan or Index Seek.
While I'd agree that it's important to remember that optimizers of any stripe are not perfect -- relational query optimization is, after all, NP-hard -- this sort of inconsistancy of behaviour is at best odd.

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-18 : 16:55:51

quote:

this sort of inconsistancy of behaviour is at best odd.

And at its worst...makes DBA go crazy!

DavidM

Tomorrow is the same day as Today was the day before.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources