Sometimes non-response re-weighting can be carried out by
comparing the characteristics of those who responded to a survey with the
whole group who the survey attempted to reach. This will not be very helpful
when (as in most household surveys) we don’t know much about the people
who do not respond.
This approach is most helpful when we are selecting a sample
form an informative sampling frame. Examples might be surveys of a workforce,
where we know the grade, age, length of service of all employees. Another
circumstance when this is used is in the context of longitudinal surveys when
a survey is re-contacting people who responded at a previous wave of the survey.
This is done in three steps.
(i) Carry out an investigation of which factors predict that
a response has been received.
(ii) Apply a weight to the responding cases that is proportional to 1/(probability
of responding).
(iii) Finally, the weights, at this stage generally above 1.0, are usually
rescaled so as to add to the responding sample numbers.
At the first step a response variable is attached to the
sampling frame that is coded as 1 for responders and 0 for non-responders.
Where the sampling frame only tells us a few things about the units we can
divide the sample up into groups (called response classes) and the probability
of response is simply the proportion who respond in each response class.
For example, in a survey of community groups one might know
only a limited number of things about them such as their location and their
funding source. This gives four response classes, as shown below, and the
probability of response can be calculated for each.
The most underrepresented group (urban, not publicly funded) gets the highest
weight to bring the sample into line with the sampling frame.
When the sampling frame contains more detailed information about the non-responders
the factors that influence non-response are often investigated via logistic
regression. The resulting model is then used to calculate the probability
of response at step (i) above, and the subsequent steps are carried out
in the same way as above.
When this type of regression model is used we need to strike a balance between
having a powerful model to predict non-reponse (and so reduce bias) and
the introduction of extreme weights that will affect precision
see weighting section 3.7.
Models are often simplified at the final stage to avoid extreme weights
(either large or small). Another practice that some surveys employ is to
cap the weights, for example by replacing all weights above 2.5 with the
value of 2.5.