| Author |
Topic |
|
elwoos
Master Smack Fu Yak Hacker
2052 Posts |
Posted - 2004-03-02 : 05:11:43
|
| I'm not sure there is an answer to this but I'd be interested in people's comments.I'm trying to create a set of test data, random names etc. My problem is that some of the fields that I am trying to populate may have a finite set of values. The system I am using is to randomly pick from this set. For some fields this works fine, but for others it's not ideal as their should be some weighting and I'm not sure how to introduce this. e.g. I have a field which is only allowed to have the values H,C or V. In the live data they may be distributed as follows H 10%, C 30%, V 60%. How can I produce random values with these weightings?thankssteveSteve no function beer well without |
|
|
ditch
Master Smack Fu Yak Hacker
1466 Posts |
Posted - 2004-03-02 : 05:46:13
|
| This is not set based I know,but if you try it , it does the trick.There is a different select for each new row and column which means that the same random number is not generated.What do you think?--**************************************************************************************DECLARE @a INTDECLARE @LetInt INTDECLARE @LET VARCHAR(1)DECLARE @Perc INTSET @a = 0WHILE @a <= 100BEGIN SET @LetInt = (select ceiling((rand() * 2) - 0.499999999999999999999999)) SET @Perc = (select ceiling((rand() *100) - 0.499999999999999999999999)) IF @LetInt = 0 BEGIN SET @LET = 'H' END ELSE BEGIN IF @LetInt = 1 BEGIN SET @LET = 'C' END ELSE BEGIN SET @LET = 'V' END END SELECT @LET + ' ' + STR(@Perc, 3) + ' %' SET @a = @a + 1END--************************************************************************************Duane. |
 |
|
|
ditch
Master Smack Fu Yak Hacker
1466 Posts |
Posted - 2004-03-02 : 05:52:33
|
| OOPS, sorry I misunderstood the question, please ignore that!Duane. |
 |
|
|
ditch
Master Smack Fu Yak Hacker
1466 Posts |
Posted - 2004-03-02 : 07:03:53
|
| This does work though,What do you think?--***************************************************************DECLARE @a INTDECLARE @b INTDECLARE @HLim INTDECLARE @CLim INTDECLARE @VLim INTSET @a = 0SET @HLim = 10SET @CLim = 30SET @VLim = 60CREATE TABLE #Main(MainID INT, Letter VARCHAR(1));SET @a = 1SET @b = 1WHILE @b <= @HLimBEGIN INSERT INTO #Main VALUES(@a, 'H') SET @b= @b + 1 SET @a = @a + 1ENDSET @b = 1WHILE @b <= @CLimBEGIN INSERT INTO #Main VALUES(@a, 'C') SET @b= @b + 1 SET @a = @a + 1ENDSET @b = 1WHILE @b <= @VLimBEGIN INSERT INTO #Main VALUES(@a, 'V') SET @b= @b + 1 SET @a = @a + 1ENDSELECT Letter from #Main where MainID = ceiling((rand() *100) - 0.499999999999999999999999)Duane. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-03-02 : 08:22:13
|
| You might be able to generate random values using set based methods by populating a source table with the proper distribution:INSERT INTO MyTable (Letter)SELECT 'H'UNION ALLSELECT 'C'UNION ALLSELECT 'C'UNION ALLSELECT 'C'UNION ALLSELECT 'V'UNION ALLSELECT 'V'UNION ALLSELECT 'V'UNION ALLSELECT 'V'UNION ALLSELECT 'V'UNION ALLSELECT 'V'-- Now, to select a letter with the frequency H 10%, C 30%, V 60%, all you need isSELECT Top 1 Letter FROM MyTable ORDER BY NEWID()Sam |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-03-02 : 11:58:36
|
| Nice one, Sammy !- Jeff |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-03-02 : 14:32:37
|
| Thanks for the compliment Doctor.Using the table of 10 items, can you build a select that returns an arbitrary number of letters confirming to the distribution? |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-03-02 : 14:42:04
|
peformance may not be great, with a tally table you could say:declare @LetterCount int;-- @LetterCount is how many letters you'd like to return:set @letterCount 30;set @@RowCount @LetterCountselect Letterfrom Letterscross join Tallywhere Tally.N <= (@LetterCount / (select count(*) from Letters)) + 1order by newid() I think that would work ... the tally table join ensure you have enough rows to pick from...i.e, if there are 10 letters but you want 30 results, you need to cross join with at least 3 rows to return at least 30 results ....performance might be horrible, but it should work. and it uses a CROSS JOIN so I'm happy. - Jeff |
 |
|
|
elwoos
Master Smack Fu Yak Hacker
2052 Posts |
Posted - 2004-03-03 : 04:21:25
|
| Thanks everyone I'll give those a go. Performance isn't a big issue as it's a one shot thing. I'm not sure about a tally table as suggested by Sammy as there are other fields that I need to do this for with many possible values (potentially a hundred or two), though I think it will work quite well for the field that I gave as an examplethanks again steveSteve no function beer well without |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-03-03 : 10:29:52
|
Tally ho Jeff. I think the solution you gave is fine, but the cross join actually creates 10 x N rows, where only N are needed.Rowcount takes care of this, and the additional rows in the 10xN set may actually help the extracted rowset "distribution frequency". In other words, you should see statistical variances around the means for each letter. Random doesn't mean exactly 10%, so starting with 10xN would move the result set to something like a real statistical distribution.Where is Arnold ? I'm sure he has an obscure post that has solved this problem previously (not that I would understand it).Actually, Steve is doing an INSERT, so building a rowset may not be needed.INSERT INTO MyCooked-Up-Data (ColA, ColB, Letter)SELECT ColA, ColB, (SELECT Top 1 Letter FROM Letters ORDER BY NEWID()) FROM MyBaseData As long as NEWID is really random, this otter give a reasonable result. |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-03-04 : 03:57:34
|
| SELECT SUBSTRING('HCCCVVVVVV', 1 + CAST(FLOOR(RAND(CAST(NEWID() AS binary(4)))* 10) AS int), 1) |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-03-04 : 07:48:17
|
| Hello Arnold,Better solution - no working table. Interesting random number seed (never seen that seed myself - thanks).Nice of you to drop by.Sam |
 |
|
|
|