MAKING IT USEABLE
Interpreting Estimates
Sampling Error![]()
Whenever a sample is drawn, by definition, only that part of the population that is included in the sample is measured, and is used to represent the entire population. Hence, there must always be some error in the data, resulting from those members of the population who were not measured. Error will, therefore, be reduced as the sample size is increased, so that, if a census is performed (a 100 percent sample is a census), by definition there will be no sampling error.
If a population from which a sample is drawn is large, then population values will not be affected very much by one or two members of the population who have extreme values of a particular measure. For example, the population of the U.S. is about 100 million households; most households have between 1 and 6 members; if 100,000 households have 10 members, and the remaining 99.9 million average 2.300 persons per household, the average for all households will change from 2.300 to 2.308. As a result of this, samples that are quite small in numeric size and very small as a percentage of the population will often have very small sampling errors.
Sampling error of a mean value estimated from a sample is equal to the estimated standard deviation of the variable divided by the square root of the sample size. It is therefore not dependent on the population size, but only on the variability of the variable of concern and sample size. For example:
- If one drew a sample of four observations from a large population, the sampling error would be equal to the standard deviation divided by 2 (the square root of four).
- To halve the sampling error of that variable, one would need to increase the sample to 16; it could be halved again by increasing the sample size to 64; and halved again by increasing the sample to 256.
- If a sample of 1,024 were selected, the sampling error would be 1/32 of the standard deviation; because standard deviations on many variables are fairly small values, this represents a very small error.
It also follows that increasingly large increases in sample size are necessary to continue to decrease the sampling error - to halve the error again to 1/64th of the standard deviation would require an increase of the sample size to 4,096, while halving again would require an increase to 16,384.
For a question that has a Yes/No answer, the standard deviation will be a maximum of 0.5; a sample of 1024 observations will provide a sampling error of 0.016, irrespective of the size of the population, provided the population is large. This is how national opinion polls can provide estimates of votes for a candidate or an issue within very small percentage errors (say 3%) with samples of only a little more than 1,000 voters from the entire U.S. population. In fact, once samples are larger than several hundreds, the errors are already small and further increases in sample size do not affect sampling error very much.
The diagram below shows the relationship of sample size and sampling error for the case of a simple two-valued variable, like a Yes/No answer.
Relationship of Percentage Sampling Error to Sample Size - Simple Binary Variable
![]()
Sample SizeIt is also important to stress, as is noted above, that sampling error is not a function of the percentage of the population in the sample, when the population is large, but is a function of the number in the sample and the variability of the of the measure of concern, only
- In the illustrations above, only the sample size needed to be known to determine how much the sampling error decreased
- If sampling error were a function of the percentage of the population sampled, then this would make it dependent on population size, also
- Only in the case of a population of a few hundreds does the population size play a role, in order that sampling error declines to nothing when the entire population is measured (this is done through a finite population correction factor, which is equal to the square root of one minus the sampling fraction, a term that will become zero when the sample equals the population.
In sampling, a large population is one that is measured in at least thousands of members, so that the population of the United States (either in persons or households) is considered a very large population for statistical purposes.
Generally, only three things need to be known to estimate sampling error:
- The method used to draw the sample.
- The size of the sample.
- The variance of the measure for which sampling error is to be determined.
An important property of sampling error is that it can always be calculated, if sampling follows one of the standard sampling procedures, such as equal probability sampling, stratified random sampling, etc.
Another important property of sampling error is that it can be considered to have approximately a normal distribution. This means that we can use properties of the normal distribution to understand what is meant by a particular value of sampling error. For example:The normal distribution tells us that there is a 95 percent probability that the estimated value from the sample is within plus or minus about two (1.96) standard deviations of the population value. If we have computed a sample mean of household size of 2.6 and we determine that the sampling error on this value 0.01, then this means that we have 95 percent confidence that the estimated value of 2.6 is within 0.02 of the population value.
There is also a 99 percent probability that the estimated sample mean is within plus or minus 2.58 standard deviations of the population value.
Clearly, the bigger the sampling error, the less sure we are about the population value; this is determined by either a large standard deviation for the variable of interest, or a small sample, or both. This is illustrated later in this training module.
Because sampling error is a function of the method used to draw the sample, extracting subsets of the data may have the effect of invalidating the error estimation, if the subsets are drawn in such a way as to introduce a substantial non-random element into the subset.
Strictly speaking, sampling errors can only be calculated for the overall sample and for random subsets of the sample. For example:Selecting all households in the NPTS that reported owning zero vehicles is a nonrandom subset, for which sampling errors cannot be calculated.
- On the other hand, one can estimate the sampling error for characteristics of zero-car-owning households as part of the full NPTS sample.
- If, on the other hand, one were to draw a random sub-sample of 1,000 households from the entire NPTS, sampling errors can be calculated exactly for this random subsample.
- This also means that sampling errors can be calculated exactly for a regional subsample of the NPTS, if that sub-sample is drawn so as to include entire strata from the stratified sampling design of the survey.
![]()
![]()