rand() is freaky! - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

rand() is freaky!

Author

Topic

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-16 : 11:02:01

Hmmm... I don't think SQL Server's optimizer plays very friendly with nondeterministic functions. Compare and contrast:


SELECT x, x, x, x, x, x, x, x
FROM (
    SELECT RAND() AS x
  ) AS A


SELECT x, x, x, x, x, x, x, x
FROM (
    SELECT MAX(RAND()) AS x
  ) AS A

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-08-16 : 12:11:33

Looks buggy to me. It appears that SQL Server is evaluating the function per reference, rather than per row. How did you come across this? Any references to it in the knowledge base?

Jonathan Boott, MCDBA
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-16 : 12:20:37

quote:

How did you come across this?

Using my patented (not really) and distinctly dodgy (but damned fast!) method for generating test data, I was running this:


CREATE TABLE Dt (
  pk int IDENTITY(1,1) PRIMARY KEY,
  startdate datetime NOT NULL,
  enddate datetime NOT NULL
)

INSERT INTO Dt
SELECT startdate,
  DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4))) *
    1440) + (0*n), startdate)
FROM (
    SELECT n,
      DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4)))*
        144000) + (0*n), '2002-01-01')) AS startdate
    FROM Numbers
  ) AS a

Now you can see there's already some kludgery going on here: the "+(0*n)" expressions are to persuade the optimizer to a generate call to RAND (and NEWID) for every row generated by Numbers, which is just a 100000 row tally table. The "+(0*n)"s probably aren't needed here, but I tend to stick it in because it can't hurt!

What shocked me was that I got instances where the enddate the preceded startdate. By changing the inner select to:


    SELECT n,
      MAX(DATEADD(minute, FLOOR(RAND(CAST(NEWID() AS binary(4))*
        144000) + (0*n), '2002-01-01')) AS startdate
    FROM Numbers
    GROUP BY n

I got the answer I expected.

Edited by - Arnold Fribble on 08/16/2002 12:25:45

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-08-16 : 12:29:11

Interesting.

I would echo your thought that the

(0*n)

references are unnecessary since RAND() is nondeterministic - no column reference should be required to force SQL Server to evaluate a nondeterministic function once per row per reference.

I say per row per reference because my guess is that SQL Server's optimizer is converting

select x, x, x, ...
from ( rowset with rand() in it as x)

to

select rand(), rand(), rand(), ...

and this is incorrect. the top form should only evaluate once, the bottom each time.

Jonathan Boott, MCDBA
{0}

Edited by - setbasedisthetruepath on 08/16/2002 12:32:04

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-08-16 : 12:45:58

If you look at the execution plans, in the first form of the query there is a compute scalar step, and each reference is indeed evaluated as a separate expressions (Expr1000, Expr1001, etc.) The Max() version creates a stream aggregate instead of a scalar computation (makes sense) and just repeatedly references it.

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-08-16 : 12:55:09

Right, though you're just restating the question.

Fribble is asserting that the execution plan generated is wrong, that the optimizer is not generating the proper plan based on the input SQL. I agree with him that the plan is wrong, the question is why, I suppose.

Jonathan Boott, MCDBA
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-16 : 12:56:00

Oh, I've remembered where I needed a 0*n. Although the MAX and GROUP BY n would have worked equally well. Try this:


DECLARE @i int
SET @i = 0

CREATE TABLE #Numbers (n int PRIMARY KEY)
SET NOCOUNT ON
BEGIN TRANSACTION
WHILE @i < 10000
BEGIN
  INSERT INTO #Numbers VALUES(@i)
  SET @i = @i + 1
END
COMMIT TRANSACTION
SET NOCOUNT OFF

SELECT x, COUNT(*)
FROM (
    SELECT x, COUNT(*) AS ct
    FROM (
        SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) AS x
        FROM #Numbers
      ) AS A
    GROUP BY x
  ) AS A
GROUP BY x
ORDER BY x

SELECT x, COUNT(*)
FROM (
    SELECT x, COUNT(*) AS ct
    FROM (
        SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) + 0*n AS x
        FROM #Numbers
      ) AS A
    GROUP BY x
  ) AS A
GROUP BY x
ORDER BY x

Edited by - Arnold Fribble on 08/16/2002 12:59:22

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-08-16 : 13:05:24

This rand(newid()) is definitely confusing the optimizer.

Running the inner SQL of your first nested select:


    SELECT x, COUNT(*) AS ct
    FROM (
        SELECT CAST(RAND(CAST(NEWID() AS binary(4)))* 100 AS int) AS x
        FROM #Numbers
      ) AS A
    GROUP BY x
   order by x

you see that the GROUP BY is broken, and the execution plan shows that it is computing RAND() after the hash match/aggregate step.

Jonathan Boott, MCDBA
{0}

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-16 : 14:18:46

quote:

Hmmm... I don't think SQL Server's optimizer plays very friendly with nondeterministic functions. Compare and contrast:

I don't think it has to do with the function being deterministic...


CREATE function x ()
returns char(1)
AS
begin
return ('x')
end
go
set showplan_text on
go
select x,x,x,x,x,x
from (
    select dbo.x() as x) as a
go
select x,x,x,x,x,x
from (
    select max(dbo.x()) as x) as a
go
set showplan_text off
go

Jay White
{0}

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-16 : 14:34:40

I find this interesting too...


create view random
as
    select rand() as x
go

select x from random
select x,x,x,x,x
from (select x from random) a

Jay White
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-16 : 15:01:10

quote:

I don't think it has to do with the function being deterministic...

I'm not sure what you mean. The plans for those queries with the UDF correspond with those for the queries in the first post. The point I was trying to make is that only one of those query plans is valid for nondeterministic functions. The optimizer can legitimately call a deterministic function (with the same argument values) as many or as few times as it likes.

I'm not sure whether I'm surprised about the VIEW behaving in that way... this does the same...


select (select x from random), (select x from random)

How about this for a use of the original query:


DECLARE @az char(26)
SET @az = 'abcdefghijklmnopqrstuvwxyz'

SELECT n,
  LEFT(c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c+c,
    FLOOR(RAND(CAST(NEWID() AS binary(4)))*10)+10)
FROM (
    SELECT n,
      SUBSTRING(@az, CAST(FLOOR(RAND(CAST(NEWID() AS binary(4)))*LEN(@az)) AS int), 1) AS c
    FROM Numbers
    WHERE n < 1000
  ) AS A

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-16 : 15:23:20

quote:

I'm not sure what you mean.

I think, maybe, we are saying the same thing. The optimizer seem to not care if the funtion (without the max/grouping) is deterministic or not...it treats 'em all the same. dbo.x() should be dterministic, and rand() should not be, but they get the same query plan.

Jay White
{0}

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-08-16 : 15:28:19

quote:
The optimizer can legitimately call a deterministic function (with the same argument values) as many or as few times as it likes.

Well, sure, though with the title 'optimizer' you would think it would call a deterministic function only once per row, since as you point out, intra-row calculations are guaranteed to return the same result given the same input.

Jonathan Boott, MCDBA
{0}

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources