Methods used to analyse surveys require that we specify
the survey design. Sometimes the data will not agree with the design that
has been specified. There are two main problems that can happen.
If the design is correct a PSUs
should never
be split across strata. This can happen because of data errors or else because
the naming convention has re-used PSU identifiers in different strata.(e.g.
PSU 1 in stratum 1 may be a different one from PSU 1 in stratum 2).
Most methods of analysing surveys will be calculating the
variability of the estimates from differences between PSUs within strata (more
about stratification). So they will fail if there are any split
PSUs.
One answer to get over the problem is to subdivide the PSUs
into sub-PSUs so that they are nested within strata. This is the right thing
to do if it is just a labelling problem. If there are only a few such among
thousands of PSUs this may be OK, but generally it is not a good thing.
Packages
can provide two things to help:
- different options about how they handle split PSUs
- tools to find out where the problem is.
Table
7.1 Packages and split PSU's
Examples
of the effect of splitting PSUs on results can be found in
Exemplar
1 and
Exemplar 2.