Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Altering a UDF after a table is bound to it

Author  Topic 

LC
Starting Member

7 Posts

Posted - 2011-07-19 : 18:54:51
Is it possible to alter a function definition after a table is bound to it?

Here's a sample function -- it doesn't refer to any tables, so I'm effectively taking schemabinding out of the picture:

create function dbo.fna(@x int)
begin
return @x + 2
end


Now a table that uses this function:

create table tbla (
a int,
b as dbo.fna(a)
)


Now say I add a few million rows of data to the table, so if I ever want to change the function definition, it's impractical to first drop the table.

So let's say I want to do this:

alter function dbo.fna(@x int)
begin
return @x + 22
end


This doesn't work, of course:

Msg 3729, Level 16, State 3, Procedure fna, Line 1
Cannot ALTER 'dbo.fna' because it is being referenced by object 'tbla'.

I can do it by first altering the table definition to drop the computed column, then changing the function, and finally re-adding the computed column:

alter table tbla drop column b

alter function dbo.fna(@x int)
begin
return @x + 22
end

alter table tbla add b as dbo.fna(a)


But in a real-life example with lots of user-defined columns being bound to lots of functions, this is impractical. If I have a function that's referred to 200 times, then I have to drop 200 columns, alter the function, and add 200 columns back again. It also has a huge potential for introducing errors, as I'd need to make sure every computed column is identical (parameters, etc.) to the way it was before.

Is there a better way to alter a UDF that's referred to by a computed column?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-07-19 : 19:38:57
If you expect to change the function definition, even once in a while, then I'd say it's better not to use the function at all in a computed column, especially ones that are that simple. UDFs can have TREMENDOUS performance penalties, since they would be called for every row processed by a query (effectively turning it into a cursor). Multiply that for every function call and your queries will be molasses.

You should consider using views that include the computed column instead of putting it into the table directly, and don't use a UDF if it can be avoided. That way you can make changes without impacting the table definition, and you lose nothing in performance.
Go to Top of Page

LC
Starting Member

7 Posts

Posted - 2011-07-20 : 16:14:08
Of course they're not going to be that simple -- that was a contrived example to illustrate the point. The actual functions will implement complex business logic and are going to be used in multiple places, hence the desire to implement it in one place for use everywhere.

I understand your point about the performance impact, but I look at it like normalization vs. denormalization -- the old rule of thumb is "normalize 'til it hurts, then denormalize 'til it works." Likewise with these functions I'm going to implement the tightest, most robust design I can, and THEN make changes as I need to for performance reasons.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-07-20 : 16:28:29
quote:
I understand your point about the performance impact
To be sure, please read this:

http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx

It's really not a question of how efficient or fast you can make the function, ultimately the function overhead will cause the majority of the delay, and it can't be reduced. This is why I mentioned cursors before, because the overhead is applied to every row to be evaluated, and it multiplies quickly.

If your intention is to apply a calculation to a set of data and then return it, use a stored procedure. You can always parameterize it to return as much or as little data as you need, in a single set, and it only executes once.
quote:
the old rule of thumb is "normalize 'til it hurts, then denormalize 'til it works."
I've never heard of this before, and I don't agree with it at all. If it works for you that's one thing, but to me it sounds like poor/lazy design, done twice. And applying functions as you described has nothing to do with normalization.
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-07-20 : 16:30:22
You're far better off doing the computation on the way in, and enforcing with a check constraint.

There is no way to avoid the performance penalty that Rob is referring to, no matter how efficient your function, if you do it the way you propose.

Go to Top of Page

LC
Starting Member

7 Posts

Posted - 2011-08-23 : 19:06:51
quote:
Originally posted by robvolk
quote:
the old rule of thumb is "normalize 'til it hurts, then denormalize 'til it works."
I've never heard of this before,


http://www.google.com/search?q=normalize%20%27til%20it%20hurts%2C%20then%20denormalize%20%27til%20it%20works&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&source=hp&channel=np

quote:
and I don't agree with it at all. If it works for you that's one thing, but to me it sounds like poor/lazy design, done twice. And applying functions as you described has nothing to do with normalization.


I didn't bring it up because applying functions as I describe has anything to do with normalization -- it doesn't. The example was to illustrate the same trade-off between theory and practice. In THEORY a fully-normalized design is optimal. In PRACTICE, performance considerations often require an amount of denormalization.

Same with the UDFs. In theory, the principles of encapsulation, code re-use, etc., dictate that I should want to use UDFs. But as you guys pointed out, there are serious performance factors to consider.

As it turns out with my particular data, doing it with the UDFs didn't significantly impact performance. I'm sure part of it is my hardware, part is the size of my data (I don't have terabytes upon terabytes -- I have three tables with about 1M rows each, two with about 100K each, and eight with less than 1K rows each), and my queries are all running with sub-second response times. So since performance considerations aren't mandating a change, I'm going to stick with the design that uses the better design principles.

...which brings be back to my original question, which was never answered. Does someone have an easy way of modifying a UDF after a table has been bound to it?

Thanks again for your help, and for your advice on the performance considerations.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-08-24 : 08:15:47
quote:
which brings be back to my original question, which was never answered. Does someone have an easy way of modifying a UDF after a table has been bound to it?
There isn't one. You described the exact process needed in your first post, which is why I replied as I did.
quote:
Same with the UDFs. In theory, the principles of encapsulation, code re-use, etc., dictate that I should want to use UDFs. But as you guys pointed out, there are serious performance factors to consider...I'm going to stick with the design that uses the better design principles
I don't see how binding constantly-changing UDFs, which have to be unbound, altered, and re-bound, is a better design principle, but if you can make it work then so be it.
quote:
In THEORY a fully-normalized design is optimal. In PRACTICE, performance considerations often require an amount of denormalization.
I'm intrigued that you find the "theory" of normalization to be impractical, simply based on theoretical performance issues, but find the "theory" of UDF encapsulation, with its documented maintenance issues, to be sound.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-08-24 : 09:54:48
You might want to watch this webcast regarding normalization and database design:

http://www.quest.com/events/ListDetails.aspx?ContentID=15071&utm_campaign=30551-12812-DB-EMEANA-Last%20chance%20Database%20Design%20Webcast&utm_medium=email&utm_source=Eloqua
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-08-24 : 10:08:09
We've given up using computed columns - sorry, that's a bit of a sweeping statement, but we kept bumping into issues when using them. Can't remember the details now, but I seem to remember issues when we tried to do INSTEAD OF triggers, adding indexes to views, and so on. Those constructions aren't the norm. for us, but when we did need them working around the computed columns was a PITA.

Now we either calculate the data "on the way in", and store it - which encapsulates the logic, and creates extraneous data, of course - or we expose them in a VIEW and then use that VIEW whenever we want them on the output side - which allows us to "improve" the logic in the computation retrospectively.

May be totally irrelevant to your conversation, in which case please ignore me!
Go to Top of Page

LC
Starting Member

7 Posts

Posted - 2011-08-24 : 10:17:56
quote:
Originally posted by robvolk

I don't see how binding constantly-changing UDFs, which have to be unbound, altered, and re-bound, is a better design principle, but if you can make it work then so be it.


Theory isn't affected by the idiosyncrasies of a particular product. Just because you have to do this in SS doesn't mean you have to do so in other products, and doesn't make it a bad principle. But yeah, in PRACTICE the fact that I have to unbind, alter and re-bind is a consideration.

quote:
I'm intrigued that you find the "theory" of normalization to be impractical, simply based on theoretical performance issues, but find the "theory" of UDF encapsulation, with its documented maintenance issues, to be sound.


Yeah, you got the first half of that statement wrong, and I'm not sure what "theoretical performance issues" even means (did you mean ANTICIPATED performance issues?).

From a theoretical standpoint, normalization, encapsulation and code re-use are all good things, and by default I'm going to design with these things in mind.

From a practical standpoint, performance issues sometimes trump theory. A perfectly normalized design (well..."perfectly normalized" gets me into trouble, so let's just say a BCNF design) that's a dog with ordinary queries is a non-starter from a practical standpoint -- obviously. Hence the old normalize/de-normalize adage I referred to earlier. This is a case where practical considerations trump theoretical considerations -- nobody's going to care about how well normalized the design is if they have to wait a long time whenever their app hits the database.

The unbind/alter/re-bind thing isn't a performance issue -- it's an inconvenience. Well...again I'm getting myself into trouble here by not being careful enough with my choice of words -- it's another POTENTIAL performance issue, as you pointed out, so in practice you do the same thing as you do with normalization: benchmark it against realistic queries, data and server load, and if it's fine (and you don't expect future growth to be an issue), then you're done.

Since I'm getting pretty fast response times on my benchmarks, and future growth isn't an issue with this data set, I don't need to compromise the design due to the issue of performance.

So now we move on to YOUR issue, which is whether to compromise my design due to the issue of convenience. That's also an interesting -- but different -- question, since it's not only an inconvenience now (in development mode), but as you pointed out, will continue to be an issue in the future in maintenance mode.

Hence my original question -- being new to SQL Server, I was hoping there was a practical workaround to make this point moot.
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-08-24 : 11:56:37
Biggest myth in the world is that in an OLTP system you pay a performance penalty for normalizing your schema.

Improper indexing and coding practices always eventually cause performance issues however.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-08-24 : 12:20:39
quote:
Originally posted by russell

Biggest myth in the world is that in an OLTP system you pay a performance penalty for normalizing your schema.

Improper indexing and coding practices always eventually cause performance issues however.
Agreed. I've always had more trouble with denormalized schemas than normalized, even with performance (especially maintenance). There are also deeper considerations besides performance, like data integrity, expandability, and maintainability. If all you really care about is raw performance, jump on the NoSQL bandwagon.

And I'm not knocking encapsulation and re-use, but I am stating that UDFs are not the best mechanism for that in SQL Server, especially with the scenario you're describing. That's based on past experience (and Kristen's too). It's extremely short-sighted to say it's fine now because you're not experiencing performance issues, and you don't anticipate data growth that would cause such issues. An ounce of prevention now will save you a metric ton of cure later.

And as Kristen mentioned, there's little or nothing computed columns will give you that views cannot, with the addition of not binding functions to columns that require schema modifications to alter. It's much easier to modify a view than a table, since it's not a physical structure (unless it's indexed). This is another design consideration as well: decoupling the logical model from the physical, which is also a best practice. There's no reason (or benefit) to having the application deal with tables directly, especially if you anticipate frequent changes to underlying functions.
quote:
Yeah, you got the first half of that statement wrong, and I'm not sure what "theoretical performance issues" even means (did you mean ANTICIPATED performance issues?)
I didn't get that part wrong, unless you've done extensive empirical comparisons of both normalized and denormalized structures, and all your measurements show one to be significantly better. If you're just going by "old normalize/de-normalize adages", then you're just assuming a normalized design will perform worse. In any case, I still recommend watching Louis' webcast tomorrow, he has a lot of good information that may help you.
Go to Top of Page
   

- Advertisement -