RosettaCodeData/Task/Knuths-algorithm-S/00DESCRIPTION

30 lines
1.7 KiB
Plaintext

This is a method of randomly sampling n items from a set of M items, with equal probability; where M >= n and M, the number of items is unknown until the end.
This means that the equal probability sampling should be maintained for all successive items > n as they become available (although the content of successive samples can change).
;The algorithm:
:* Select the first n items as the sample as they become available;
:* For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If not, have it randomly (1/n) replace one of the previously selected n items of the sample.
:* Repeat &nbsp; 2<sup>nd</sup> step &nbsp; for any subsequent items.
;The Task:
:* Create a function <code>s_of_n_creator</code> that given <math>n</math> the maximum sample size, returns a function <code>s_of_n</code> that takes one parameter, <code>item</code>.
:* Function <code>s_of_n</code> when called with successive items returns an equi-weighted random sample of up to n of its items so far, each time it is called, calculated using Knuths Algorithm S.
:* Test your functions by printing and showing the frequency of occurrences of the selected digits from 100,000 repetitions of:
:::# Use the s_of_n_creator with n == 3 to generate an s_of_n.
:::# call s_of_n with each of the digits 0 to 9 in order, keeping the returned three digits of its random sampling from its last call with argument item=9.
Note: A class taking n and generating a callable instance/function might also be used.
;Reference:
* The Art of Computer Programming, Vol 2, 3.4.2 p.142
;Related tasks:
* [[One of n lines in a file]]
* [[Accumulator factory]]
<br><br>