Often weighting is needed in surveys because of
stratification disproportionate sampling.
Another important reason for weighting is to adjust for
survey non-response. For a
comparison with other, non-survey, reasons for weighting see
below.
But weighting may be required for other sample designs
that use non-equal probabilities of selection. In many
surveys some units are, usually for reasons of convenience,
given a higher probability of selection than others. This is
dealt with in the analysis by weighting the data. Examples
include:
- selection of individuals within households
- selection of events
1. Selection of individuals within
households
It is usual in surveys of the general population to select an
equal probability sample of addresses from the Postcode
Address File and then to:
- enumerate the households at that address and select
all, or a sub-sample of households, and
- to enumerate the adults within each household and
select just one for interview.
This results in adults from larger households being
under-sampled relative to adults from smaller households. It
can also lead to households
from multi-household addresses being under-sampled.
2. Selection of events
In some surveys the unit of analysis is sometimes an event
rather than an individual. For instance, in the Health Survey
for England, some analysis is done on the accidents that
individuals experience, where the unit of analysis is an
accident rather than the individual having the accident.
What often happens in these cases is that, for each
individual in the survey, the events of interest are
enumerated and a random sub-sample of events per person
selected. Details are then only collected on the sub-sample
of events. This tends to mean that the probability of
selection for each event is smaller for individuals with a
larger number of events than for those with a smaller number
of events.
Why non-equal probabilities of selection are
used:
Survey sample designers would probably avoid non-equal
probabilities of selection that are not part of the
stratification if they could. The reason they can’t be
avoided is usually because the units on the sampling frame do
not match one-to-one with the units of analysis.
Instead the analysis units cluster within the frame units.
For instance, individuals cluster within households, and
households cluster within addresses, yet
‘address’ is often the sampling frame unit.
Taking this example of selecting individuals from a sampling
frame of addresses: In such instances, to select a strictly
equal probability sample of individuals there are only a
limited number of options:
- If you have a count of the number of individuals per
address then you can over-sample ‘large’
addresses in proportion to their size and then select one
individual per address. The under-sampling at the second
stage then cancels out the over-sampling at the first
stage. In practice this approach is not an option in the
UK because the Postcode Address File ( PAF)
does not have a count of individuals.
- Select addresses with equal probability and then select
all individuals at that address for the survey. This
approach is sometimes used in surveys, but it has some
serious problems attached to it. Firstly, in large
households the survey can become very burdensome.
Secondly, depending upon the subject of the survey, the
responses of individuals from the same household may be
highly correlated, in which case within-household
clustering effects will come into play.
So, rather than accept these problems, survey designers often
opt for the alternative of selecting addresses with equal
probability and then selecting just one individual per
household and then weighting the data to deal with the
non-equal probabilities of selection.