Population vs Sample Standard Deviation
In statistics, several indices are used to describe a data set corresponding to its central tendency, dispersion and skewness. Standard deviation is one of the most common measures of dispersion of data from the center of the data set.
Due to practical difficulties, it will not be possible to make use of data from the whole population when a hypothesis is tested. Therefore, we employ data values from samples to make inferences about the population. In such a situation, these are called estimators since they estimate the population parameter values.
It is extremely important to use unbiased estimators in inference. An estimator is said to be unbiased if the expected value of that estimator is equal to the population parameter. For example, we use the sample mean as an unbiased estimator for the population mean. (Mathematically, it can be shown that the expected value of the sample mean is equal to the population mean). In the case of estimating the population standard deviation, the sample standard deviation is an unbiased estimator too.
What is population standard deviation?
When data from the whole population can be taken in to account (for example in the case of a census) it is possible to calculate the population standard deviation. To calculate the standard deviation of the population, first the deviations of data values from the population mean are calculated. The root mean square (quadratic mean) of deviations is called the population standard deviation.
In a class of 10 students, data about the students can easily be collected. If a hypothesis is tested on this population of students, then there is no need of using sample values. For example, the weights of the 10 students (in kilograms) are measured to be 70, 62, 65, 72, 80, 70, 63, 72, 77 and 79. Then the mean weight of the ten people (in kilograms) is (70+62+65+72+80+70+63+72+77+79)/10, which is 71 (in kilograms). This is the population mean.
Now to calculate the population standard deviation, we calculate deviations from the mean. The respective deviations from the mean are (70 – 71) = 1, (62 – 71) = 9, (65 – 71) = 6, (72 – 71) = 1, (80 – 71) = 9, (70 – 71) = 1, (63 – 71) = 8, (72 – 71) = 1, (77 – 71) = 6 and (79 – 71) = 8. The sum of squares of deviation is (1)^{2 }+ (9)^{2 }+ (6)^{2 }+ 1^{2 }+ 9^{2 }+ (1)^{2 }+ (8)^{2 }+ 1^{2 }+ 6^{2 }+ 8^{2 }= 366. The population standard deviation is √(366/10) = 6.05 (in kilograms). 71 is the exact mean weight of the students of the class and 6.05 is the exact standard deviation of weight from 71.
What is sample standard deviation?
When data from a sample (of size n) are used to estimate parameters of the population, the sample standard deviation is calculated. First the deviations of data values from the sample mean are calculated. Since the sample mean is used in place of the population mean (which is unknown), taking the quadratic mean is not appropriate. In order to compensate for the use of sample mean, the sum of squares of deviations is divided by (n1) instead of n. The sample standard deviation is the square root of this. In mathematical symbols, S = √{∑(x_{i}ẍ)^{2} / (n1)}, where S is the sample standard deviation, ẍ is the sample mean and x_{i}’s are the data points.
Now assume that, in the previous example, the population is the students of the whole school. Then, the class will be only a sample. If this sample is used in the estimation, the sample standard deviation will be √(366/9) = 6.38 (in kilograms) since 366 was divided by 9 instead of 10 (the sample size). The fact to observe is that this is not guaranteed to be the exact population standard deviation value. It is merely an estimate for it.
What is the difference between population standard deviation and sample standard deviation? • Population standard deviation is the exact parameter value used to measure the dispersion from the center, whereas the sample standard deviation is an unbiased estimator for it. • Population standard deviation is calculated when all the data regarding each individual of the population is known. Else, the sample standard deviation is calculated. • Population standard deviation is given by σ = √{ ∑(xiµ)^{2}/ n} where µ is the population mean and n is the population size but the sample standard deviation is given by S = √{ ∑(xiẍ)^{2} / (n1)} where ẍ is the sample mean and n is the sample size.
