26 lines
1.5 KiB
Plaintext
26 lines
1.5 KiB
Plaintext
A method of choosing a line randomly from a file:
|
|
::* Without reading the file more than once
|
|
::* When substantial parts of the file cannot be held in memory
|
|
::* Without knowing how many lines are in the file
|
|
Is to:
|
|
::* keep the first line of the file as a possible choice, then
|
|
::* Read the second line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/2.
|
|
::* Read the third line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/3.
|
|
::* ...
|
|
::* Read the Nth line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/N
|
|
|
|
::* Return the computed possible choice when no further lines exist in the file.
|
|
|
|
|
|
;Task:
|
|
# Create a function/method/routine called <code>one_of_n</code> that given <code>n</code>, the number of actual lines in a file, follows the algorithm above to return an integer - the line number of the line chosen from the file. <br>The number returned can vary, randomly, in each run.
|
|
# Use <code>one_of_n</code> in a ''simulation'' to find what would be the chosen line of a 10-line file simulated 1,000,000 times.
|
|
# Print and show how many times each of the 10 lines is chosen as a rough measure of how well the algorithm works.
|
|
|
|
|
|
Note: You may choose a smaller number of repetitions if necessary, but mention this up-front.
|
|
|
|
Note: This is a specific version of a Reservoir Sampling algorithm: https://en.wikipedia.org/wiki/Reservoir_sampling
|
|
<br><br>
|
|
|