1995 Standard Error Calculations

See the paper titled "Standard Error Calculations for NPTS Estimates" for the full documentation of the 1995 NPTS web site standard error calculations. An overview is presented below.

A standard error is a statistic that measures the precision of another statistic.  Consider, for example, a simple random sample x1, ..., xn of n observations, and let M denote the ordinary sample mean of the xi:  M = Sxi /n.  Introductory statistics texts explain that the standard error of M is s/n1/2, where s, the standard deviation, is  [(Sxi2 - M2)/(n-1)]1/2.  Standard errors for NPTS estimates are similar but more complicated, because the NPTS sampling strategy is more complicated than simple random sampling.  Standard errors are often used to compute confidence intervals:  under many circumstances, an estimate ± 1.96 standard errors is a good, approximate 95% confidence interval for the estimate.

When you are subsetting the data, you must consider the household level where clause. Standard errors of NPTS estimates depend on the numbers of households represented in the estimates. It is therefore necessary that subsetting statements like "where state eq 'AL'" or "where sex eq 'F'" are properly accounted for in the standard error calculations. A subsetting restriction like "where state eq 'AL'", for example, reduces the number of households considered, but "where sex eq 'F'" typically would not. Ordinarily "where sex eq 'F'" would simply indicate that results for males should not be counted in the tabulations, and not that any households should be excluded from the sample.

If a where statement component involves only household level variables (e.g., "where state eq 'AL'"), call it a household (HH) level component. If a where statement component involves only lower level data variables (e.g., "where sex eq 'F'"), call it a lower level (LL) component. Call a where statement component mixed if it combines both HH and LL components, as in "where state eq 'AL' and "sex eq 'F'". Inferring the correct interpretation can become especially complicated when the subsetting restrictions are mixed (see the documentation for additional information).

Therefore, the tablew.sas macro (and web interface) handles user-supplied where statements as follows. If a statement has only HH components, it is assumed to exclude households from the sample. If it has only LL components, it is assumed to exclude observations from the tabulations, but not households from the sample. If the where clause is mixed, then the macro requires that the user explicitly distinguish which households should be used in the analysis by supplying a where statement to define the subset of observations to be used. Since that where statement will nearly always involve only HH components, it is referred to as a household level statement. The components in this household level statement may be repeated (without effect) in the original where statement. See the documentation for more information.

We have the following resources online: