Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Not Normal

Author  Topic 

elwoos
Master Smack Fu Yak Hacker

2052 Posts

Posted - 2004-03-02 : 05:11:43
I'm not sure there is an answer to this but I'd be interested in people's comments.

I'm trying to create a set of test data, random names etc. My problem is that some of the fields that I am trying to populate may have a finite set of values. The system I am using is to randomly pick from this set. For some fields this works fine, but for others it's not ideal as their should be some weighting and I'm not sure how to introduce this.

e.g. I have a field which is only allowed to have the values H,C or V. In the live data they may be distributed as follows H 10%, C 30%, V 60%. How can I produce random values with these weightings?

thanks

steve

Steve no function beer well without

ditch
Master Smack Fu Yak Hacker

1466 Posts

Posted - 2004-03-02 : 05:46:13
This is not set based I know,
but if you try it , it does the trick.

There is a different select for each new row and column which means that the same random number is not generated.

What do you think?

--**************************************************************************************

DECLARE @a INT
DECLARE @LetInt INT
DECLARE @LET VARCHAR(1)
DECLARE @Perc INT

SET @a = 0

WHILE @a <= 100
BEGIN
SET @LetInt = (select ceiling((rand() * 2) - 0.499999999999999999999999))
SET @Perc = (select ceiling((rand() *100) - 0.499999999999999999999999))

IF @LetInt = 0
BEGIN
SET @LET = 'H'
END
ELSE
BEGIN
IF @LetInt = 1
BEGIN
SET @LET = 'C'
END
ELSE
BEGIN
SET @LET = 'V'
END
END
SELECT @LET + ' ' + STR(@Perc, 3) + ' %'
SET @a = @a + 1
END

--************************************************************************************



Duane.
Go to Top of Page

ditch
Master Smack Fu Yak Hacker

1466 Posts

Posted - 2004-03-02 : 05:52:33
OOPS, sorry I misunderstood the question, please ignore that!


Duane.
Go to Top of Page

ditch
Master Smack Fu Yak Hacker

1466 Posts

Posted - 2004-03-02 : 07:03:53
This does work though,
What do you think?

--***************************************************************
DECLARE @a INT
DECLARE @b INT

DECLARE @HLim INT
DECLARE @CLim INT
DECLARE @VLim INT


SET @a = 0

SET @HLim = 10
SET @CLim = 30
SET @VLim = 60

CREATE TABLE #Main(MainID INT, Letter VARCHAR(1));

SET @a = 1
SET @b = 1

WHILE @b <= @HLim
BEGIN
INSERT INTO #Main VALUES(@a, 'H')
SET @b= @b + 1
SET @a = @a + 1
END

SET @b = 1
WHILE @b <= @CLim
BEGIN
INSERT INTO #Main VALUES(@a, 'C')
SET @b= @b + 1
SET @a = @a + 1
END


SET @b = 1
WHILE @b <= @VLim
BEGIN
INSERT INTO #Main VALUES(@a, 'V')
SET @b= @b + 1
SET @a = @a + 1
END


SELECT Letter from #Main where MainID = ceiling((rand() *100) - 0.499999999999999999999999)

Duane.
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-03-02 : 08:22:13
You might be able to generate random values using set based methods by populating a source table with the proper distribution:

INSERT INTO MyTable (Letter)
SELECT 'H'
UNION ALL
SELECT 'C'
UNION ALL
SELECT 'C'
UNION ALL
SELECT 'C'
UNION ALL
SELECT 'V'
UNION ALL
SELECT 'V'
UNION ALL
SELECT 'V'
UNION ALL
SELECT 'V'
UNION ALL
SELECT 'V'
UNION ALL
SELECT 'V'

-- Now, to select a letter with the frequency H 10%, C 30%, V 60%, all you need is

SELECT Top 1 Letter FROM MyTable ORDER BY NEWID()

Sam
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-03-02 : 11:58:36
Nice one, Sammy !

- Jeff
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-03-02 : 14:32:37
Thanks for the compliment Doctor.

Using the table of 10 items, can you build a select that returns an arbitrary number of letters confirming to the distribution?

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-03-02 : 14:42:04
peformance may not be great, with a tally table you could say:


declare @LetterCount int;
-- @LetterCount is how many letters you'd like to return:
set @letterCount 30;

set @@RowCount @LetterCount

select
Letter
from
Letters
cross join
Tally
where
Tally.N <= (@LetterCount / (select count(*) from Letters)) + 1
order by
newid()


I think that would work ... the tally table join ensure you have enough rows to pick from...

i.e, if there are 10 letters but you want 30 results, you need to cross join with at least 3 rows to return at least 30 results ....

performance might be horrible, but it should work. and it uses a CROSS JOIN so I'm happy.


- Jeff
Go to Top of Page

elwoos
Master Smack Fu Yak Hacker

2052 Posts

Posted - 2004-03-03 : 04:21:25
Thanks everyone I'll give those a go. Performance isn't a big issue as it's a one shot thing.

I'm not sure about a tally table as suggested by Sammy as there are other fields that I need to do this for with many possible values (potentially a hundred or two), though I think it will work quite well for the field that I gave as an example

thanks again

steve

Steve no function beer well without
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-03-03 : 10:29:52
Tally ho Jeff. I think the solution you gave is fine, but the cross join actually creates 10 x N rows, where only N are needed.

Rowcount takes care of this, and the additional rows in the 10xN set may actually help the extracted rowset "distribution frequency". In other words, you should see statistical variances around the means for each letter. Random doesn't mean exactly 10%, so starting with 10xN would move the result set to something like a real statistical distribution.

Where is Arnold ? I'm sure he has an obscure post that has solved this problem previously (not that I would understand it).

Actually, Steve is doing an INSERT, so building a rowset may not be needed.

INSERT INTO MyCooked-Up-Data (ColA, ColB, Letter)
SELECT ColA, ColB, (SELECT Top 1 Letter FROM Letters ORDER BY NEWID())
FROM MyBaseData


As long as NEWID is really random, this otter give a reasonable result.

Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-03-04 : 03:57:34
SELECT SUBSTRING('HCCCVVVVVV', 1 + CAST(FLOOR(RAND(CAST(NEWID() AS binary(4)))* 10) AS int), 1)
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-03-04 : 07:48:17
Hello Arnold,

Better solution - no working table. Interesting random number seed (never seen that seed myself - thanks).

Nice of you to drop by.

Sam
Go to Top of Page
   

- Advertisement -