| Author |
Topic |
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2002-02-17 : 20:17:01
|
I'm starting to discover that user defined functions are a big performance problem. Worse, it seems that they are not factored into the execution plan as seen in query analyzer.Here's an example:-- Create basic log tableCREATE TABLE log_running (i int identity,datetime datetime)-- Insert a bunch of dataDECLARE @iX intWhile @iX<250000 insert into log_running (datetime) VALUES(getdate())-- Index the tableCREATE INDEX [idx_log_run_datesite] ON [dbo].[log_running]([datetime] DESC) ON [PRIMARY]-- Here's a function that returns the current date and strips the timeCREATE FUNCTION dbo.f_frnk_util_DateOnly (@dDate as datetime) RETURNS datetime AS BEGIN DECLARE @d datetimeselect @d=convert(datetime,convert(varchar(10),@dDate,101))return @dEND-- Ok, let's check how many entries in the log are from today-- QUERY 1select count(*) from log_running where datetime>=dbo.f_frnk_util_DateOnly(getdate())-- QUERY 2 (adjust for today's real date)select count(*) from log_running where datetime>='2/17/2002 12:00:00AM' Query Analyzer says the cost of query 1 is 0.392 on my system. It says query 2 costs .471. So Query 1 is better, right? Nope. Query 1 takes 11 seconds to run, query 2 takes 2 seconds.What I've discovered is that the query cost does not include the use of the user defined function. At first, I thought my mistake was in passing getdate() to the function. Replacing getdate() with a datetime variable that's set to getdate() before the select does not improve performance.What's going on? Am I really off better tracking down every use of a function and replacing it with inline code that does the same thing? If so, what's the point of UDF's? Beyond that, how can I tell if I have bad logic in a function if their performance isn't factored into the query plan? Mystified,-b |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-17 : 20:32:51
|
| aiken,As you have just found out, the graphical execution plan is not the definitive answer.. Have you tried using these?SET STATISTICS TIME ONSET STATISTICS IO ONorSET SHOWPLAN_ALL ONDavidMTomorrow is the same day as Today was the day before. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-02-17 : 20:46:26
|
| Why don't you just use this:SELECT count(*) FROM log_running WHERE DateDiff(dd, [datetime], getdate())=0By the way, did you name a column after a reserved word (datetime)? That's a sure-fire way to have code that doesn't work properly. NEVER use a SQL reserved word for object or column names.There is an entry in Books Online called "user-defined function recommendations", also called "best practices".As far as including the function's plan in query analyzer, I can't see that it would help you even if it was. Your function does not access a table (it returns a scalar value), so the optimizer has no other table access to factor into the plan. The overhead of the function running on each row is where the extra time comes into play. That's why the 2nd version is faster; the value is fixed.You can keep the function and do this:DECLARE @dateVal datetimeSELECT @dateVal=dbo.f_frnk_util_DateOnly(getdate())SELECT count(*) FROM log_running WHERE datetime>=@dateValIt's equivalent to your 2nd query.I don't think it's accurate to suggest UDFs perform badly; you just need to use them properly. Read all the entries in Books Online regarding UDFs and try some experimentation, you'll find where they work best and where to avoid them. You should also read up on and understand deterministic and non-deterministic functions.Edited by - robvolk on 02/17/2002 20:54:52 |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2002-02-17 : 20:56:10
|
Thanks for the great (and speedy) responses.I have definitely learned my lesson regarding the graphical showplan.As for the column name, I can plead not my fault! I inherited this system and changing the column name would be prohibitive as far as appdev and code changes. Believe me, if I have to type [datetime] one more time, I am going to find the previous DBA and strangle him.I have ended up using the declare variable, set variable to function value, use variable in query approach. The reason I didn't use the datediff part is because the table can actually span years, and I didn't want to get into dy and year dateparts all over the place; I thought the simple function would help out. Turns out I was wrong :)(Edited to add: I will read BOL regarding functions, so forgive me if the following is obvious/redundant).What I don't understand is why the query optimizer can't figure out that the function is deterministic; it does not reference any tables, has no if/thens; it's basically a wrapper for a couple of converts based on the input value. Shouldn't be that hard to figure out that any given input value will always have the same output value, and collapse that into the query rather than re-running the function for every single row.Anyways, thanks again. I'm moving along again here, but I'm definitely going to reexamine my use of UDF's to see if there are other places that I'm forcing the server to do massive computations where I was hoping the UDF would be evaluated just once.Is there ever a case where the UDF *is* evaluated just once? For instance,CREATE FUNCTION dbo.f_ONE(@i int) ASRETURNS INTRETURN 1 ...if I use that in a select query, is it going to be evalated for each row?Thanks again -- I love sqlteam.com!-bEdited by - aiken on 02/17/2002 20:57:32 |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-17 : 21:00:48
|
Good call Rob!As for performance of UDF's.....I have been experimenting with using them instead of Foreign Keys for simulating Domains..Lets take PostCodes as our domain...--PostCodes Table (Domain)Create table PostCodes(PostCode smallint not null Primary Key)go--Populate PostCodeDECLARE @Loop intSET @LOOP = 1000WHILE @LOOP < 1100BEGIN INSERT PostCodes (PostCode) VALUES (@Loop)SET @LOOP = @LOOP + 1ENDgo--PostCode Existence CheckCREATE FUNCTION ufCheckPostCode(@PostCode int)RETURNS BITASBEGINDECLARE @Bit BITIF EXISTS(SELECT 1 FROM PostCodes WHERE PostCode = @PostCode) SET @Bit = 1ELSE SET @Bit = 0RETURN @BitENDgo--Address Table using UDF as Domain CREATE TABLE AddressUDF(AddressID int not null Identity(1,1) Primary Key, PostCode SMALLINT NOT NULL CHECK(dbo.ufCheckPostCode(PostCode)= 1))--Address Table using FK as Domain CREATE TABLE AddressFK(AddressID int not null Identity(1,1) Primary Key, PostCode SMALLINT NOT NULL REFERENCES PostCodes(PostCode))go As you can see there I have created 2 tables one using UDF's the other FK..Now run the followingSET STATISTICS TIME ONSET STATISTICS IO OFFSET STATISTICS PROFILE OFFprint 'UDF Insert'INSERT AddressUDF(PostCode) VALUES (1001)print 'FK Insert'INSERT AddressFK(PostCode) VALUES (1001)print 'UDF Update'UPDATE AddressUDF SET PostCode = 1002 WHERE PostCode = 1001print 'FK Update'UPDATE AddressFK SET PostCode = 1002 WHERE PostCode = 1001print 'UDF Delete'DELETE AddressUDF WHERE PostCode = 1002print 'FK Delete'DELETE AddressFK WHERE PostCode = 1002 I was surprised at the result....UDF faster than FK...Any comments? Any one?DavidMTomorrow is the same day as Today was the day before. |
 |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2002-02-17 : 21:01:03
|
quote: Believe me, if I have to type [datetime] one more time, I am going to find the previous DBA and strangle him.
I just thought of a great new feature for SQLTeam.A "crackhead DBA Register". Where SQLTeam memebers can register the names of DBAs and "consultants" who implemented crackhead data models and cursor based solutions. Anyone that sees their name on the list will have the opportunity to try to defend their design, and we can roast them Then, anyone here in the position of hiring DBAs and Developers can check names against the register so they don't have the same mistakes made.I know of one guy who's first name starts with a D and who's second name starts with a T who wouldn't pick up much work if the word spread.What do you all think ? Graz, are you listening ?Damian |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-17 : 21:14:44
|
| Sounds harsh Damian... But definately fair...DavidMTomorrow is the same day as Today was the day before. |
 |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2002-02-17 : 21:18:03
|
Just think how many posts we have seen here that start with"I have inherited a system that uses 4 nested cursors creating a CSV file to be inserted into an Access database that gets DTSed into an Excel report" etc etcI can see an end being put to all of that! Actually, I am completely delusional, but it's a nice thought.Damian |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2002-02-17 : 21:25:05
|
| Well, I've seen a few "worst practice" articles going around recently, but I tell you that I, for one, would benefit from some kind of simple "do not do these things" FAQ with single line nasty practices and a link to a good article explaining why.I can definitely blame some of my DB's problems on a previous administrator, but I personally created almost 200 stored procedures named sp_* before I learned better. Among other embarrassing problems (foreign keys being a different data type, clustered indexes on identity columns in heavily-inserted tables, using cursors, I could go on and on... I just wonder what stupid mistakes I'm *still making*).Cheers-b |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-02-17 : 21:29:03
|
quote: I was surprised at the result....UDF faster than FK...Any comments? Any one?
It's a shot in the dark, but:1. Using EXISTS is historically faster than other operators;2. FOREIGN KEY requires unique constraint or index, but may not use it in the lookup process, while your function probably wouldHave you tried this using an INSERT...SELECT...FROM operation, and multi-row UPDATE/DELETE operations? It might only be faster on single-row ops and crap out on large multi-row ones.quote: A "crackhead DBA Register". Where SQLTeam members can register the names of DBAs and "consultants" who implemented crackhead data models and cursor based solutions. Anyone that sees their name on the list will have the opportunity to try to defend their design, and we can roast them
While I would love to see something like that, I doubt we'll get a lot of new members if they see their name in it... ...and as you suggested we've had a crackhead defend (strenuously!) their bad design/cursors, while never admitting they were bad, even when (exhaustively) proven. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-17 : 21:31:50
|
| Thanks for the feedback Rob...I will look at the batch insert and delete..DavidMTomorrow is the same day as Today was the day before. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-17 : 22:23:43
|
| Some quick feedback Rob....Bulk Inserting - UDF fasterBulk Update - UDF fasterBulk Delete - Same (As you would expect - No FK or UDF Check)DavidMTomorrow is the same day as Today was the day before. |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-02-18 : 05:14:56
|
| Woah! Hold up! Wait a second!Aiken, are you saying that you are seeing different performance between this:select count(*) from log_running where datetime>=dbo.f_frnk_util_DateOnly(getdate())and the code Rob posted:DECLARE @dateVal datetimeSELECT @dateVal=dbo.f_frnk_util_DateOnly(getdate())SELECT count(*) FROM log_running WHERE datetime>=@dateValGiven the table set-up you posted, I get the same run times (~100ms) for both these and the one with the literal date. Obviously, the queries using the UDF get silly cost estimates, but that's the optimizer not being able to predict the return value of the UDF and relying on its heuristics for inequality matches -- it's guessing that only 30% of rows will match the predicate. For the fixed value, it can tell from the statistics that (virtually) all the rows will match. SQL Server 2000 sp2.Edit: Now here's a strange thing. Drop the index and the query plans have almost identical cost estimations (fair enough: it will have to scan the whole table). But now the call to the UDF gets pushed out into a filter and executed for every row. Result: the UDF-calling query now takes over a minute. And yes, it's the UDF call because inlining the code of the UDF makes it run sensibly.Edit2: Stranger still, without the index, SET STATISTICS TIME ON reports that while the other queries are taking 135ms, the UDF one takes 0ms!Edited by - Arnold Fribble on 02/18/2002 05:44:01 |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2002-02-18 : 12:28:53
|
| Yep, I am seeing different run times for the two approaches; assigning the variable is quite a bit quicker for me. However, I am SQL2K, SP1, so maybe SP2 fixes something in this area?Cheers-b |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-02-18 : 13:17:27
|
| There's nothing to fix, that's a normal side effect in pretty much any programming language. Pulling a function through a loop is always slower than pulling a variable, or a constant value, because of the overhead of processing the function. The ideal is to limit the areas where a function is called unnecessarily, especially in a loop. This also applies to objects and their properties. You should assign a property value to a variable and then process it in a loop, instead of calling the object repeatedely.This site is pretty good in ways of optimizing your code. While it's VB a lot of the ideas apply to other languages, including SQL:www.vb2themax.com |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-02-18 : 16:51:08
|
| I don't think it's as cut-and-dried as that. SQL SELECTs are, after all, pure functional for the most part. There appears to be a limitation on when an independent UDF call will be pulled out of the loop. In SP2 it seems to be the case that a UDF call can be put into the SEEK argument, but not the WHERE argument of a Table/Index Scan or Index Seek.While I'd agree that it's important to remember that optimizers of any stripe are not perfect -- relational query optimization is, after all, NP-hard -- this sort of inconsistancy of behaviour is at best odd. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2002-02-18 : 16:55:51
|
quote: this sort of inconsistancy of behaviour is at best odd.
And at its worst...makes DBA go crazy!DavidMTomorrow is the same day as Today was the day before. |
 |
|
|
|