Clustering the sample almost always leads to an increase in
the
standard error
of survey estimates (relative to the standard error
for a simple random sample). This means that
design effects
and
design factors
are increased as a result of the clustered design.
The degree of increase depends upon two things:
(i) the sample size per cluster
(ii) the homogeneity of the clusters.
For a given sample size, as the cluster sample size
increases, the standard error tends also to increase.
The homogeneity of the clusters is measured by the
intra-cluster correlation coefficient (ICC or r 'roh'). If the individuals within a
cluster have more in common than individuals have in general,
then r will be greater than zero.
If, at the extreme, all individuals within a cluster are
identical yet there is some between-cluster variation, then
r will be equal to 1.
As r increases so does the
standard error. This makes some intuitive sense - if the
individuals within a cluster are all alike, but are different
to individuals from other clusters, then with a clustered
sample there is an increased risk of drawing a sample that
happens to be very different to the population, and this risk
is reflected in the standard error. The risk will increase as
the number of clusters selected decreases (or, equivalently,
as the sample size per cluster increases).
In a household-based survey where the PSUs are
geographical areas (such as postcode sectors) the types of
variables that have relatively high r values are those with relatively
little within-area variation. Tenure and dwelling type are
examples.
The results from exemplar 1
(check link) and exemplar
2 illustrate this for real data.