RosettaCodeData/Task/Statistics-Basic/00-TASK.txt

35 lines
1.9 KiB
Plaintext

[[Statistics|Statistics]] is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their [[wp:Mean|mean value]] and [[wp:Standard_deviation|standard deviation (stddev)]].
If you have set of data <math>x_i</math> where <math>i = 1, 2, \ldots, n\,\!</math>, the mean is <math>\bar{x}\equiv {1\over n}\sum_i x_i</math>, while the stddev is <math>\sigma\equiv\sqrt{{1\over n}\sum_i \left(x_i - \bar x \right)^2}</math>.
When examining a large quantity of data, one often uses a [[wp:Histogram|histogram]], which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs.
'''Task''' Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation?
'''Extra''' Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.)
;Hint:
For a finite population with equal probabilities at all points, one can derive:
:<math>\overline{(x - \overline{x})^2} = \overline{x^2} - \overline{x}^2</math>
Or, more verbosely:
:<math>
\frac{1}{N}\sum_{i=1}^N(x_i-\overline{x})^2 = \frac{1}{N} \left(\sum_{i=1}^N x_i^2\right) - \overline{x}^2.
</math>
{{task heading|See also}}
* [[Statistics/Normal_distribution|Statistics/Normal distribution]]
{{Related tasks/Statistical measures}}
<br><hr>