SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 Script Library
 Random Integer, Sample, and Datetime Functions
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 07/23/2006 :  16:23:16  Show Profile  Reply with Quote
Edit: There seems to be some interaction between GROUP BY/NEWID() that can cause problems if these functions are used directly in an aggregate query. Please read the posts on this thread about this, and use caution.


This script creates three functions, F_RANDOM_INTEGER, F_RANDOM_SAMPLE, and F_RANDOM_DATETIME. The last parameter of each function must be function NEWID() to generate the random number.

Theses functions are designed for applications where it is necessary to generate random integers, random datetimes, or take random samples from sets of data. Typical applications would be software testing, inventory control, auditing, and product quality testing.

Function F_RANDOM_INTEGER returns a random integer in the range of the input parameters so that the return value is >= @START_INT and <= @END_INT. It is valid for any range of two integer values.

Function F_RANDOM_SAMPLE returns a 1 or a 0 to determine if a sample should be selected, based on the input sample rate. Input parameter @SAMPLE_RATE should be between 1 and 999,999. The sample rate determines how many samples should be selected out of each 1,000,000 samples. A sample rate below 1 will never select a sample, and a sample rate above 999,999 will always select a sample. A sample rate of 1,000 = 0.1%, 50,000 = 5%, 63,775 = 6.3775%, 100,000 = 10%, and 500,000 = 50%.

F_RANDOM_DATETIME returns a random datetime value >= @START_TIME and < @END_TIME. It is valid for any datetime range. Input parameters default, if null, to @START_TIME = '19000101' and @END_TIME = '19000102’. The datetime is random to the level of clock ticks (1/300 of as second). Note that the latest time is not included in the range of datatime values that can be returned. This is to allow selection of times within adjacent time periods, without having to specify times to the level of milliseconds. This means a range of 1990-12-01 01:00:00.000 through 1990-12-01 02:00:00.000 will never return a value of 1990-12-01 02:00:00.000.

The NEWID() function is the basis of the random numbers. These functions should not to be considered random for purposes of data encryption or other high security applications. However, they should be adequate for business applications of the types mentioned above. I conducted extensive testing with the functions where I generated millions of results, analyzed the results various ways to look for non-random patterns, and I saw no evidence of non-random results.

The script also includes a demo of each function, and sample output from the demos is also included.

The demo script uses the number table function, F_TABLE_NUMBER_RANGE, available on this link:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685





if objectproperty(object_id('dbo.F_RANDOM_INTEGER'),'IsScalarFunction') = 1
	begin drop function dbo.F_RANDOM_INTEGER end
go
create function dbo.F_RANDOM_INTEGER
	(
	@START_INT	int,
	@END_INT	int,
	@NEWID		uniqueidentifier
	)
returns
	int
as
/*
Function: F_RANDOM_INTEGER

	This function returns a random integer
	value >= @START_INT and <= @END_INT.

	Valid for any integer range.

	Requires newid() to be input parameter @NEWID
	to generate the random number.

-- Return random integer between -100, 200000
select [Random Integer] = 
	[dbo].[F_RANDOM_INTEGER](-100, 200000,newid())
*/
begin

declare @sn bigint
declare @en bigint
declare @mod bigint
declare  @rand_bigint  bigint
declare  @rand_result  int

-- Set default values for input dates if they are null
if @START_INT is null begin set @START_INT = 0 end
if @END_INT is null begin set @END_INT =1000000 end

-- Set order of input parameters so that return value is always
-- the same no matter what order the input values are passed.
if @START_INT > @END_INT
	select @sn = @END_INT, @en = @START_INT
else
	select @sn = @START_INT, @en = @END_INT

-- Return start int if start int = end int
if  @sn = @en return @sn

-- Get modulus
select @mod = @en-@sn+1

-- Get random bigint from input parameter @NEWID
select @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))


-- Get random integer
select @rand_result = @sn+(@rand_bigint%@mod)

return @rand_result

end
go
grant execute on dbo.F_RANDOM_INTEGER to public
go

if objectproperty(object_id('dbo.F_RANDOM_SAMPLE'),'IsScalarFunction') = 1
	begin drop function dbo.F_RANDOM_SAMPLE end
go
create function dbo.F_RANDOM_SAMPLE
	(
	@SAMPLE_RATE	int,
	@NEWID		uniqueidentifier
	)
returns
	int
as
/*
Function: F_RANDOM_SAMPLE

	This function returns a 1 or a 0 to determine if a sample
	should be selected, based on the sample rate.  It is designed
	to select random samples at a specific rate.

	Input parameter @SAMPLE_RATE should be between 1 and 999,999. 
	The sample rate determines how many samples should be
	selected out of each 1,000,000 samples.  A sample rate below 1
	will never select a sample, and a sample rate above 999,999 will
	always select a sample.  1,000 = 0.1%, 50,000 = 5%, 63,775 = 6.3775%, 
	100,000 = 10%, and 500,000 = 50%

	Requires newid() to be input parameter @NEWID
	to generate the random number.

-- Select sample 200,000 times in 1,000,000 samples (20%)
select [Random Sample] = 
	[dbo].[F_RANDOM_SAMPLE](200000,newid())
*/
begin

declare  @rand_bigint  bigint

-- Get random bigint from @NEWID
select @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))


-- Select sample if the modulus of @rand_bigint is less than the sample rate 
return  case when @rand_bigint%1000000 < @SAMPLE_RATE then 1 else 0 end

end
go
grant execute on dbo.F_RANDOM_SAMPLE to public
go

if objectproperty(object_id('dbo.F_RANDOM_DATETIME'),'IsScalarFunction') = 1
	begin drop function dbo.F_RANDOM_DATETIME end
go
create function dbo.F_RANDOM_DATETIME
	(
	@START_TIME	datetime,
	@END_TIME	datetime,
	@NEWID		uniqueidentifier
	)
returns
	datetime
as
/*
Function: F_RANDOM_DATETIME

	This function returns a random datetime
	value >= @START_TIME and < @END_TIME.

	Valid for any datetime range.

	Input parameters default, if null, to:
	@START_TIME	'19000101'
	@END_TIME	'19000102'

	Requires newid() to be input parameter @NEWID
	to generate the random number.

-- Return random time between 08:30 and 12:00
select [Random Time] = 
	[dbo].[F_RANDOM_DATETIME]('08:30:00.000','12:00:00.000',newid())
*/
begin

declare @st datetime
declare @et datetime

declare @hours int
declare @ms int
declare @ticks bigint
declare @rand_ticks bigint
declare @rand_bigint bigint

declare @remaining_ticks int
declare @return_hours int
declare @return_ms int

-- Set default values for input dates if they are null
if @START_TIME is null begin set @START_TIME = '19000101' end
if @END_TIME is null begin set @END_TIME = '19000102' end

-- Set order of input parameters so that return value is always
-- the same no matter what order the input values are passed.
if @START_TIME > @END_TIME
	select @st = @END_TIME, @et = @START_TIME
else
	select @st = @START_TIME, @et = @END_TIME

-- Return start time if start time = end time
if  @st = @et return @st

-- Get hours boundary difference.
-- Subtract 1 from diff, before dividing by 2 and multiplying by 2
-- so the milliseconds remaining is always positive and
-- hours is always >= zero.
set @hours = ((datediff(hh,@st,@et)-1)/2)*2

-- Get remainder milliseconds
set @ms = datediff(ms,0,@et-dateadd(hh,@hours,@st))

-- Convert remainder milliseconds to
-- SQL Server 'clock ticks' of 1/300 of a second
set @ticks = ((@ms/10)*3) + ((@ms%10)/3)

-- Add hours * tick per hour (3600*300) to give total
-- ticks between  @START_TIME and @END_TIME
set @ticks = @ticks + (@hours * 0000001080000 )

-- Get random bigint from input parameter @NEWID
select @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))

-- Get random number of ticks
select @rand_ticks =  @rand_bigint%@ticks

-- Get hours component of random ticks
select @return_hours =  @rand_ticks/1080000

-- Get left over ticks after removing hours.
select @remaining_ticks =  @rand_ticks%1080000

--Convert remaining clock ticks back to milliseconds
select @return_ms = ((@remaining_ticks/3)*10) + floor(((@remaining_ticks%3)*3.5))

-- Return the random time between the start and end time
return dateadd(ms,@return_ms,dateadd(hh,@return_hours,@st))

end
go
grant execute on dbo.F_RANDOM_DATETIME to public
go

print '-----------------------------------------------------------------'
print '  Demo F_RANDOM_INTEGER function'
print '-----------------------------------------------------------------'
print ''

declare @t table ([Random Integer] int not null )

insert into  @t
select
	-- Get integert in range of 1 to 10,000,000
	[Random Integer] = 
		[dbo].[F_RANDOM_INTEGER](1,10000000,newid() )
from
	-- Function F_TABLE_NUMBER_RANGE
	-- available in Script Library forum
	F_TABLE_NUMBER_RANGE(1,100000)


select
	[Right Int] = [Random Integer]%10,
	[Count] = count(*)
from
	@t a
group by
	 [Random Integer]%10
order by
	1,2

select
	[Million Range] = [Random Integer]/1000000,
	[Count] = count(*)
from
	@t a
group by
	 [Random Integer]/1000000
order by
	1,2

go
print '-----------------------------------------------------------------'
print '  Demo F_RANDOM_SAMPLE function'
print '-----------------------------------------------------------------'
print ''

declare @t table ([Sample Taken] int not null )

insert into  @t
select
	-- Sample rate = 6.3775%
	[Sample Taken] = [dbo].[F_RANDOM_SAMPLE](63775,newid())
from
	-- Function F_TABLE_NUMBER_RANGE
	-- available in Script Library forum
	F_TABLE_NUMBER_RANGE(1,100000)

select
	[Sample Taken],
	[Result Count] = count(*)
from
	@t a
group by
	[Sample Taken]
order by
	1,2


go

print '-----------------------------------------------------------------'
print '  Demo F_RANDOM_DATETIME function'
print '-----------------------------------------------------------------'
print ''
select
	Random_Datetime = 
	convert(varchar(23),[dbo].[F_RANDOM_DATETIME]( a.ST, a.ET,newid() ) ,121) ,
	[Start] = convert(varchar(23),a.ST ,121) ,
	[End] = convert(varchar(23),a.ET ,121) ,
	a.Comment
from
(
select ST = getdate(), ET = getdate()+2  , 
	Comment = 'Now thru 2 days from now' union all
select '20060101', '20060102' , 'One day diff' union all
select '20030101', '20030101' ,'Both times same'  union all
select '20030101', '20030108' ,'One week diff'  union all
select '20021228', '20030104' ,'One week diff'  union all
select '20010701', '20010713' ,'12 day diff'  union all
select '20010701', '20010714' ,'13 day diff'  union all
select '20010630', '20010713' ,'13 day diff'  union all
select '19901201 01:00:00.000', '19901201  02:00:00.000' ,'1 hour diff'  union all
select '19901201 01:00:33.003', '19901201  02:00:33.003' ,'1 hour diff'  union all
select '19901201 01:00:00.000', '19901201  01:30:00.000' ,'30 min diff'  union all
select '19901201 01:00:33.447', '19901201  01:30:33.447' ,'30 min diff'  union all
select '19901201 01:00:00.000', '19901201  01:05:00.000' ,'5 min diff'  union all
select '19901201 01:00:29.123', '19901201  01:05:29.123' ,'5 min diff'  union all
select '19901201 01:00:00.000', '19901201  01:01:00.000' ,'1 min diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:01.000' ,'1 sec diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.100' ,'100 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.050' ,'50 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.023' ,'23 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.020' ,'20 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.013' ,'13 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.010' ,'10 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.007' ,'7 ms diff'  union all
select '19901201 01:00:00.000', '19901201  01:00:00.003' ,'3 ms diff'  union all
select '20030101', '20030201' ,'One month diff 31 days'  union all
select '20030101', '20040101' ,'One year diff'  union all
select '20050101', '20070101' ,'Two year diff'  union all
select '20060101', '20060301' ,'2 month diff'  union all
select null, '20060101' ,'Start time null'  union all
select '20060102', null ,'End time null'  union all
select null, null ,'Both null'  union all
select '17530101', '99991231 23:59:59.997' ,'Max datetime diff'  union all
select  '99991231 23:59:59.997','17530101' ,'Max datetime diff reversed'
) a


Demo Results:


-----------------------------------------------------------------
  Demo F_RANDOM_INTEGER function
-----------------------------------------------------------------
 

(100000 row(s) affected)

Right Int   Count       
----------- ----------- 
0           9929
1           10055
2           10009
3           10082
4           9919
5           10022
6           9914
7           9985
8           10098
9           9987

(10 row(s) affected)

Million Range Count       
------------- ----------- 
0             10009
1             9985
2             10142
3             10047
4             9967
5             9907
6             10078
7             10071
8             9790
9             10004

(10 row(s) affected)

-----------------------------------------------------------------
  Demo F_RANDOM_SAMPLE function
-----------------------------------------------------------------
 

(100000 row(s) affected)

Sample Taken Result Count 
------------ ------------ 
0            93669
1            6331

(2 row(s) affected)

-----------------------------------------------------------------
  Demo F_RANDOM_DATETIME function
-----------------------------------------------------------------
 
Random_Datetime         Start                   End                     Comment                    
----------------------- ----------------------- ----------------------- -------------------------- 
2006-07-24 08:21:22.593 2006-07-23 17:54:47.283 2006-07-25 17:54:47.283 Now thru 2 days from now
2006-01-01 06:44:36.897 2006-01-01 00:00:00.000 2006-01-02 00:00:00.000 One day diff
2003-01-01 00:00:00.000 2003-01-01 00:00:00.000 2003-01-01 00:00:00.000 Both times same
2003-01-05 01:57:02.183 2003-01-01 00:00:00.000 2003-01-08 00:00:00.000 One week diff
2003-01-01 20:02:05.550 2002-12-28 00:00:00.000 2003-01-04 00:00:00.000 One week diff
2001-07-02 11:35:11.147 2001-07-01 00:00:00.000 2001-07-13 00:00:00.000 12 day diff
2001-07-02 16:39:57.433 2001-07-01 00:00:00.000 2001-07-14 00:00:00.000 13 day diff
2001-07-06 12:33:53.087 2001-06-30 00:00:00.000 2001-07-13 00:00:00.000 13 day diff
1990-12-01 01:15:42.530 1990-12-01 01:00:00.000 1990-12-01 02:00:00.000 1 hour diff
1990-12-01 01:02:21.647 1990-12-01 01:00:33.003 1990-12-01 02:00:33.003 1 hour diff
1990-12-01 01:21:06.267 1990-12-01 01:00:00.000 1990-12-01 01:30:00.000 30 min diff
1990-12-01 01:26:17.983 1990-12-01 01:00:33.447 1990-12-01 01:30:33.447 30 min diff
1990-12-01 01:00:56.327 1990-12-01 01:00:00.000 1990-12-01 01:05:00.000 5 min diff
1990-12-01 01:03:20.423 1990-12-01 01:00:29.123 1990-12-01 01:05:29.123 5 min diff
1990-12-01 01:00:21.617 1990-12-01 01:00:00.000 1990-12-01 01:01:00.000 1 min diff
1990-12-01 01:00:00.443 1990-12-01 01:00:00.000 1990-12-01 01:00:01.000 1 sec diff
1990-12-01 01:00:00.050 1990-12-01 01:00:00.000 1990-12-01 01:00:00.100 100 ms diff
1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.050 50 ms diff
1990-12-01 01:00:00.010 1990-12-01 01:00:00.000 1990-12-01 01:00:00.023 23 ms diff
1990-12-01 01:00:00.017 1990-12-01 01:00:00.000 1990-12-01 01:00:00.020 20 ms diff
1990-12-01 01:00:00.007 1990-12-01 01:00:00.000 1990-12-01 01:00:00.013 13 ms diff
1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.010 10 ms diff
1990-12-01 01:00:00.003 1990-12-01 01:00:00.000 1990-12-01 01:00:00.007 7 ms diff
1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.003 3 ms diff
2003-01-14 09:00:09.520 2003-01-01 00:00:00.000 2003-02-01 00:00:00.000 One month diff 31 days
2003-08-27 11:47:04.100 2003-01-01 00:00:00.000 2004-01-01 00:00:00.000 One year diff
2006-11-23 03:57:21.737 2005-01-01 00:00:00.000 2007-01-01 00:00:00.000 Two year diff
2006-01-12 08:50:40.717 2006-01-01 00:00:00.000 2006-03-01 00:00:00.000 2 month diff
1933-11-15 13:39:10.050 NULL                    2006-01-01 00:00:00.000 Start time null
1997-05-28 06:42:32.407 2006-01-02 00:00:00.000 NULL                    End time null
1900-01-01 01:50:42.743 NULL                    NULL                    Both null
2758-10-18 13:50:47.987 1753-01-01 00:00:00.000 9999-12-31 23:59:59.997 Max datetime diff
8426-03-24 13:51:08.407 9999-12-31 23:59:59.997 1753-01-01 00:00:00.000 Max datetime diff reversed

(33 row(s) affected)





CODO ERGO SUM

Edited by - Michael Valentine Jones on 07/23/2006 20:17:42

Kristen
Test

United Kingdom
22403 Posts

Posted - 07/23/2006 :  16:38:44  Show Profile  Reply with Quote
Nice job MVJ
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 07/23/2006 :  16:41:32  Show Profile  Visit nr's Homepage  Reply with Quote
I would be very cautious about using that (in fact I wouldn't).
Functions are expected to be deterministic and can produce incorrect results if you manage to fool the server.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 07/23/2006 :  16:56:41  Show Profile  Reply with Quote
"I would be very cautious about using that"

Do you mean the workaround in order to use NewID() from a function?

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 07/23/2006 :  16:57:22  Show Profile  Reply with Quote
quote:
Originally posted by nr

I would be very cautious about using that (in fact I wouldn't).
Functions are expected to be deterministic and can produce incorrect results if you manage to fool the server.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.


I have to disagree that “Functions are expected to be deterministic”. Any function that calls an extended stored procedure is non-deterministic, and this is specifically allowed in the documentation in BOL.

However, if you are really nervous about getting the NEWID() from the view, you could modify the functions to allow the NEWID() to be passed as an input parameter.




CODO ERGO SUM
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 07/23/2006 :  17:19:10  Show Profile  Visit nr's Homepage  Reply with Quote
>> I have to disagree that “Functions are expected to be deterministic”. Any function that calls an extended stored procedure is non-deterministic, and this is specifically allowed in the documentation in BOL.

Ok - maybe I should have said it won't be deterministic.


Odd - if you pass in the newid() it takes 1 sec to process my sample but 20 secs if call the view to get it.
That's running
select d, count(*) c from
(select d = dbo.F_RANDOM_DATETIME2('1 jan 2006','1 jan 2006 00:00:01', newid())
from sysobjects, sysobjects b, sysobjects c
) a
group by d
order by d

select d, count(*) c from
(select d = dbo.F_RANDOM_DATETIME('1 jan 2006','1 jan 2006 00:00:01')
from sysobjects, sysobjects b, sysobjects c
) a
group by d
order by d

Even passing in the newid() some duplicate dates are produced but with the view very few of the duplicates are removed (on my system)


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 07/23/2006 :  19:30:57  Show Profile  Reply with Quote
>>> "Even passing in the newid() some duplicate dates are produced but with the view very few of the duplicates are removed (on my system)"

I think you are correct that there is some problem with using the view; it seems to foul up the group by operation. I removed the view and added @NEWID as a parameter to each function and reposted the code.

However, I think there is still some unexplained interaction with the group by and NEWID() that is causing a problem. Notice how in the following code, the TOP expression causes the problem to go away. I would almost call this a bug in SQL server. Or at least an undocumented feature.

Edit: Could someone try the code below in SQL 2005 to see if it productes the same results?


drop table #temp
go
select
	number
into
	#temp
from
	F_TABLE_NUMBER_RANGE(1,10000) a
go
print 'No duplicates with top'
select
	 nm = rnd%10
from
	(
	select top 100 percent
		rnd=abs(convert(bigint,convert(varbinary(20),newid() )))
	from
		#temp
	) a
group by  rnd%10
order by  rnd%10
go
print 'Gives duplicates'
select
	 nm = rnd%10
from
	(
	select 
		rnd=abs(convert(bigint,convert(varbinary(20),newid() )))
	from
		#temp
	) a
group by  rnd%10
order by  rnd%10


Results:


(10000 row(s) affected)

No duplicates with top
nm                   
-------------------- 
0
1
2
3
4
5
6
7
8
9

(10 row(s) affected)

Gives duplicates
nm                   
-------------------- 
1
1
1
5
5
6
7
8
8
9

(10 row(s) affected)






CODO ERGO SUM

Edited by - Michael Valentine Jones on 07/23/2006 19:32:53
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.11 seconds. Powered By: Snitz Forums 2000