Random Record + Different Probability

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Random Record + Different Probability

Author

Topic

shapper
Constraint Violating Yak Guru

450 Posts

Posted - 2011-04-24 : 17:30:10

Hello,

I am planning a SQL table where later I need to:

1 - Get a random record.
I know I can do this ordering by Guid.

2 - Change the probability of each record to be selected.
Using a scale with the multiples of 10 would be enough:
10%, 20%, ..., 90%, 100%.

Is this possible?

Thank You,
Miguel

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-04-24 : 18:50:47

How do you want to specify the multiples for each row? Do you have a very small number and can do it individually?

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-04-24 : 21:26:06

To follow up on Nigel's question, if I understand your requirement:
:: when probability reaches 20%, one out of 5 records will be selected.
:: when probability is 100% all the records will be selected.

If that is true, could you do something simple like this?


declare @TotalRecords int; set @TotalRecords = 1000;
declare @probability float; set @probability = 0.2; -- 20% probability that a record will be selected

select top (cast(@TotalRecords*@probability as int)) * from YourTable order by newid();

shapper
Constraint Violating Yak Guru

450 Posts

Posted - 2011-04-24 : 22:45:43

Hi,

Let me try to explain better:

Each table's row has a column named weight.
That weight can be 10, 20, 30, ..., 90, 100.

So I want to randomly select ONE row but where the rows with weight 100 have a higher probability of being selected than the rows with weight 10.

Does this make sense?

Thank You,
Miguel

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2011-04-24 : 23:50:20

is this good enough ? Hopefully your table is not too big and the "weight" is within reasonable limit.


declare	@sample table
(
	row_no	int	identity(1,1),
	weight	int,
	cnt	int	default 0
)

insert into @sample (weight) select	1
insert into @sample (weight) select	2
insert into @sample (weight) select	5
insert into @sample (weight) select	6
insert into @sample (weight) select	8

declare	@i	int,
	@row_no	int

select	@i	= 1

-- runs it 1000 time
while	@i 	<= 1000
begin
	select	top 1 @row_no = row_no
	from	@sample s
		cross join numbers n	
        where   n.number between 1 and s.weight
	order by newid()

	update	@sample
	set	cnt	= cnt + 1
	where	row_no	= @row_no

	select	@i = @i + 1
end

select	*, 
	weight_ratio 	= convert(decimal(10,2), weight * 100.0 / sum(weight) over()),
	cnt_ratio	= convert(decimal(10,2), cnt * 100.0 / sum(cnt) over())
from	@sample

Note : numbers is just a tally table.

KH
[spoiler]Time is always against us[/spoiler]

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-04-25 : 07:27:26

Here is my take on it. I am using the same data as KH, but its a different approach. What I am doing is:

1. Calculate the running total of the weights. The range between the running total of a row and the running total of the next row "belongs" to the row.

2. Calculate a random number less than max(running_total) and pick the row to which the random number "belongs".

create	table #tmp
(
	row_no	int	identity(1,1),
	weight	int,
	cnt	int	default 0
);
insert into #tmp (weight) select	1;
insert into #tmp (weight) select	2;
insert into #tmp (weight) select	5;
insert into #tmp (weight) select	6;
insert into #tmp (weight) select	8;


with CTE as
(
	select  row_no, weight as running_total from #tmp where row_no = 1
	union all
	select t.row_no, t.weight+c.running_total
	from #tmp t inner join CTE c on c.row_no+1 = t.row_no
)
update t set
	cnt = running_total
from
	#tmp t inner join CTE c on c.row_no = t.row_no;
	
select * from #tmp;

create unique index idx_cnt on #tmp(cnt);

-- pick record to select.
declare @n int
select @n = max(cnt) from #tmp;

select top 1 *
from #tmp 
where cnt >=  abs(cast(cast(newid() as varbinary) as bigint))%(@n+1)
order by cnt;

drop table #tmp