Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Random Record + Different Probability

Author  Topic 

shapper
Constraint Violating Yak Guru

450 Posts

Posted - 2011-04-24 : 17:30:10
Hello,

I am planning a SQL table where later I need to:

1 - Get a random record.
I know I can do this ordering by Guid.

2 - Change the probability of each record to be selected.
Using a scale with the multiples of 10 would be enough:
10%, 20%, ..., 90%, 100%.

Is this possible?

Thank You,
Miguel

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-04-24 : 18:50:47
How do you want to specify the multiples for each row? Do you have a very small number and can do it individually?

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-04-24 : 21:26:06
To follow up on Nigel's question, if I understand your requirement:
:: when probability reaches 20%, one out of 5 records will be selected.
:: when probability is 100% all the records will be selected.

If that is true, could you do something simple like this?

declare @TotalRecords int; set @TotalRecords = 1000;
declare @probability float; set @probability = 0.2; -- 20% probability that a record will be selected

select top (cast(@TotalRecords*@probability as int)) * from YourTable order by newid();
Go to Top of Page

shapper
Constraint Violating Yak Guru

450 Posts

Posted - 2011-04-24 : 22:45:43
Hi,

Let me try to explain better:

Each table's row has a column named weight.
That weight can be 10, 20, 30, ..., 90, 100.

So I want to randomly select ONE row but where the rows with weight 100 have a higher probability of being selected than the rows with weight 10.

Does this make sense?

Thank You,
Miguel
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2011-04-24 : 23:50:20
is this good enough ? Hopefully your table is not too big and the "weight" is within reasonable limit.


declare @sample table
(
row_no int identity(1,1),
weight int,
cnt int default 0
)

insert into @sample (weight) select 1
insert into @sample (weight) select 2
insert into @sample (weight) select 5
insert into @sample (weight) select 6
insert into @sample (weight) select 8

declare @i int,
@row_no int

select @i = 1

-- runs it 1000 time
while @i <= 1000
begin
select top 1 @row_no = row_no
from @sample s
cross join numbers n
where n.number between 1 and s.weight
order by newid()

update @sample
set cnt = cnt + 1
where row_no = @row_no

select @i = @i + 1
end

select *,
weight_ratio = convert(decimal(10,2), weight * 100.0 / sum(weight) over()),
cnt_ratio = convert(decimal(10,2), cnt * 100.0 / sum(cnt) over())
from @sample


Note : numbers is just a tally table.


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-04-25 : 07:27:26
Here is my take on it. I am using the same data as KH, but its a different approach. What I am doing is:

1. Calculate the running total of the weights. The range between the running total of a row and the running total of the next row "belongs" to the row.

2. Calculate a random number less than max(running_total) and pick the row to which the random number "belongs".
create	table #tmp
(
row_no int identity(1,1),
weight int,
cnt int default 0
);
insert into #tmp (weight) select 1;
insert into #tmp (weight) select 2;
insert into #tmp (weight) select 5;
insert into #tmp (weight) select 6;
insert into #tmp (weight) select 8;


with CTE as
(
select row_no, weight as running_total from #tmp where row_no = 1
union all
select t.row_no, t.weight+c.running_total
from #tmp t inner join CTE c on c.row_no+1 = t.row_no
)
update t set
cnt = running_total
from
#tmp t inner join CTE c on c.row_no = t.row_no;

select * from #tmp;

create unique index idx_cnt on #tmp(cnt);

-- pick record to select.
declare @n int
select @n = max(cnt) from #tmp;

select top 1 *
from #tmp
where cnt >= abs(cast(cast(newid() as varbinary) as bigint))%(@n+1)
order by cnt;

drop table #tmp

Edit: Added where row_no = 1 in the anchor portion of the CTE.
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2011-04-25 : 08:03:20
this is much more efficient than my method


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-04-25 : 12:19:09
But let us see if shapper thinks if either approach is any good before declaring victory
Go to Top of Page
   

- Advertisement -