PRACTICAL MATHEMATICS
Sampling
Contents
- Samples
- Sample Means
- Sample Variances
Suppose that a large population has mean mu and variance sigma2. If the population is so large that it cannot be observed as a whole, the mean and variance must be estimated from random samples.
Taking a sample does not change the statistics of the population when:
- we make physical measurements which can be repeated any number of times,
- the population is very large compared with the number of observations in the sample,
- each member of the population may be taken more than once in the same sample.
This method of sampling is called sampling with replacement.
Let x1, ... , xn be the members of a sample of size n, taken from a population with mean mu and variance sigma2. Define the sample mean m to be:
m = (x1 + ... + xn)/n.
Then the Law of Large Numbers states that, as n increases indefinitely, the mean of the sample approaches the mean of the population:
lim m = mu.
The Central Limit Theorem states that, for any population (not necessarily normal), as n increases indefinitely, the probability density phi(m) of the sample mean approaches the normal probability density with mean E[m] = mu and variance Var[m] = sigma2/n.
EXAMPLE
Suppose we have a very large population with mean mu = 100 and variance sigma2 = 400. (Then sigma = 20.) Suppose we take many samples "with replacement", and each sample contains 50 observations (n = 50). Then the distribution of the sample means is approximately normal with mean E[m] =approx 100, and variance Var[m] =approx 400/50 = 8. (The standard deviation of the sample means is sqrt(8) = 2.83.)
In this example 68% of the samples have 97.2 < m < 102.8, 95% of the samples have 94.3 < m < 105.7, and 99.7% of the samples have 91.5 < m < 108.5.
EXERCISE
Find the distribution of m for samples containing 80 observations taken from a very large population having mean mu = 50 and variance sigma2 = 500.
The expected variance in a sample is slightly less than the variance of the population. Let s2 be the sample variance defined by:
s2 = [(x1 - m)2 + ... + (xn - m)2]/n.
Then the expected value of s2 is related to the population variance sigma2 by the equation:
E[s2] = (n - 1).sigma2/n.
The distribution of s2 in large samples is given by the following theorem:
As n approaches infinity, the probability density phi(s2) of s2 approaches the normal probability density function with mean E[s2] = sigma2 and variance Var[s2] = 2.sigma2/n.
EXAMPLE
Suppose that a very large population has variance sigma2 = 400. Suppose that many samples with n = 50 are taken from this population. Then, in this collection of samples, we have E[s2] = (50 - 1)×400/50 =392 (=approx 400), and Var[s2] =approx 2×400/50 = 16.
In this example 68% of the samples have 392 - 4 < s2 < 392 + 4, which gives 19.7 < s < 19.9; 95% of the samples have 392 - 8 < s2 < 392 + 8, which gives 19.6 < s < 20.0; and 99.7% of the samples have 392 - 12 < s2 < 392 + 12, which gives 19.5 < s < 20.1.
EXERCISE
Find the expected value of s2 for samples of size n = 80 taken from a very large population having variance sigma2 = 500.
Home Page
By R. H. B. Exell, 1998. King Mongkut's University of Technology Thonburi.