RosettaCodeData/Task/Knuths-algorithm-S/00DESCRIPTION

24 lines
1.6 KiB
Plaintext

This is a method of randomly sampling n items from a set of M items, with equal probability; where M >= n and M, the number of items is unknown until the end.
This means that the equal probability sampling should be maintained for all successive items > n as they become available (although the content of successive samples can change).
;The algorithm:
# Select the first n items as the sample as they become available;
# For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If not, have it randomly (1/n) replace one of the previously selected n items of the sample.
# Repeat #2 for any subsequent items.
;The Task:
# Create a function <code>s_of_n_creator</code> that given <math>n</math> the maximum sample size, returns a function <code>s_of_n</code> that takes one parameter, <code>item</code>.
# Function <code>s_of_n</code> when called with successive items returns an equi-weighted random sample of up to n of its items so far, each time it is called, calculated using Knuths Algorithm S.
# Test your functions by printing and showing the frequency of occurrences of the selected digits from 100,000 repetitions of:
:# Use the s_of_n_creator with n == 3 to generate an s_of_n.
:# call s_of_n with each of the digits 0 to 9 in order, keeping the returned three digits of its random sampling from its last call with argument item=9.
Note: A class taking n and generating a callable instance/function might also be used.
;Reference:
* The Art of Computer Programming, Vol 2, 3.4.2 p.142
;Cf.
* [[One of n lines in a file]]
* [[Accumulator factory]]