Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-08-16 : 11:02:01
|
Hmmm... I don't think SQL Server's optimizer plays very friendly with nondeterministic functions. Compare and contrast:SELECT x, x, x, x, x, x, x, xFROM ( SELECT RAND() AS x ) AS A SELECT x, x, x, x, x, x, x, xFROM ( SELECT MAX(RAND()) AS x ) AS A |
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-08-16 : 12:11:33
|
Looks buggy to me. It appears that SQL Server is evaluating the function per reference, rather than per row. How did you come across this? Any references to it in the knowledge base?Jonathan Boott, MCDBA{0} |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-08-16 : 12:20:37
|
quote: How did you come across this?
Using my patented (not really) and distinctly dodgy (but damned fast!) method for generating test data, I was running this:CREATE TABLE Dt ( pk int IDENTITY(1,1) PRIMARY KEY, startdate datetime NOT NULL, enddate datetime NOT NULL)INSERT INTO DtSELECT startdate, DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4))) * 1440) + (0*n), startdate)FROM ( SELECT n, DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4)))* 144000) + (0*n), '2002-01-01')) AS startdate FROM Numbers ) AS a Now you can see there's already some kludgery going on here: the "+(0*n)" expressions are to persuade the optimizer to a generate call to RAND (and NEWID) for every row generated by Numbers, which is just a 100000 row tally table. The "+(0*n)"s probably aren't needed here, but I tend to stick it in because it can't hurt!What shocked me was that I got instances where the enddate the preceded startdate. By changing the inner select to: SELECT n, MAX(DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4))* 144000) + (0*n), '2002-01-01')) AS startdate FROM Numbers GROUP BY n I got the answer I expected.Edited by - Arnold Fribble on 08/16/2002 12:25:45 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-08-16 : 12:29:11
|
Interesting.I would echo your thought that the (0*n) references are unnecessary since RAND() is nondeterministic - no column reference should be required to force SQL Server to evaluate a nondeterministic function once per row per reference.I say per row per reference because my guess is that SQL Server's optimizer is convertingselect x, x, x, ...from ( rowset with rand() in it as x)toselect rand(), rand(), rand(), ...and this is incorrect. the top form should only evaluate once, the bottom each time.Jonathan Boott, MCDBA{0}Edited by - setbasedisthetruepath on 08/16/2002 12:32:04 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-08-16 : 12:45:58
|
If you look at the execution plans, in the first form of the query there is a compute scalar step, and each reference is indeed evaluated as a separate expressions (Expr1000, Expr1001, etc.) The Max() version creates a stream aggregate instead of a scalar computation (makes sense) and just repeatedly references it. |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-08-16 : 12:55:09
|
Right, though you're just restating the question.Fribble is asserting that the execution plan generated is wrong, that the optimizer is not generating the proper plan based on the input SQL. I agree with him that the plan is wrong, the question is why, I suppose.Jonathan Boott, MCDBA{0} |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-08-16 : 12:56:00
|
Oh, I've remembered where I needed a 0*n. Although the MAX and GROUP BY n would have worked equally well. Try this:DECLARE @i intSET @i = 0CREATE TABLE #Numbers (n int PRIMARY KEY)SET NOCOUNT ONBEGIN TRANSACTIONWHILE @i < 10000BEGIN INSERT INTO #Numbers VALUES(@i) SET @i = @i + 1ENDCOMMIT TRANSACTIONSET NOCOUNT OFFSELECT x, COUNT(*)FROM ( SELECT x, COUNT(*) AS ct FROM ( SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) AS x FROM #Numbers ) AS A GROUP BY x ) AS AGROUP BY xORDER BY xSELECT x, COUNT(*)FROM ( SELECT x, COUNT(*) AS ct FROM ( SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) + 0*n AS x FROM #Numbers ) AS A GROUP BY x ) AS AGROUP BY xORDER BY x Edited by - Arnold Fribble on 08/16/2002 12:59:22 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-08-16 : 13:05:24
|
This rand(newid()) is definitely confusing the optimizer.Running the inner SQL of your first nested select: SELECT x, COUNT(*) AS ct FROM ( SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) AS x FROM #Numbers ) AS A GROUP BY x order by x you see that the GROUP BY is broken, and the execution plan shows that it is computing RAND() after the hash match/aggregate step.Jonathan Boott, MCDBA{0} |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-08-16 : 14:18:46
|
quote: Hmmm... I don't think SQL Server's optimizer plays very friendly with nondeterministic functions. Compare and contrast:
I don't think it has to do with the function being deterministic...CREATE function x ()returns char(1)ASbeginreturn ('x')endgoset showplan_text ongoselect x,x,x,x,x,xfrom ( select dbo.x() as x) as agoselect x,x,x,x,x,xfrom ( select max(dbo.x()) as x) as agoset showplan_text offgo Jay White{0} |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-08-16 : 14:34:40
|
I find this interesting too...create view randomas select rand() as xgoselect x from randomselect x,x,x,x,xfrom (select x from random) a Jay White{0} |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-08-16 : 15:01:10
|
quote: I don't think it has to do with the function being deterministic...
I'm not sure what you mean. The plans for those queries with the UDF correspond with those for the queries in the first post. The point I was trying to make is that only one of those query plans is valid for nondeterministic functions. The optimizer can legitimately call a deterministic function (with the same argument values) as many or as few times as it likes.I'm not sure whether I'm surprised about the VIEW behaving in that way... this does the same...select (select x from random), (select x from random) How about this for a use of the original query:DECLARE @az char(26)SET @az = 'abcdefghijklmnopqrstuvwxyz'SELECT n, LEFT(c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c, FLOOR(RAND(CAST(NEWID() AS binary(4)))*10)+10)FROM ( SELECT n, SUBSTRING(@az, CAST(FLOOR(RAND(CAST(NEWID() AS binary(4)))*LEN(@az)) AS int), 1) AS c FROM Numbers WHERE n < 1000 ) AS A |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-08-16 : 15:23:20
|
quote: I'm not sure what you mean.
I think, maybe, we are saying the same thing. The optimizer seem to not care if the funtion (without the max/grouping) is deterministic or not...it treats 'em all the same. dbo.x() should be dterministic, and rand() should not be, but they get the same query plan.Jay White{0} |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-08-16 : 15:28:19
|
quote: The optimizer can legitimately call a deterministic function (with the same argument values) as many or as few times as it likes.
Well, sure, though with the title 'optimizer' you would think it would call a deterministic function only once per row, since as you point out, intra-row calculations are guaranteed to return the same result given the same input.Jonathan Boott, MCDBA{0} |
|
|
|
|
|
|
|