The data analysed in exemplar are taken from the Scottish Household Survey, years 2001 nnd 2002. The data used is taken from the interview with the 'Random adult' .
The data for this exemplar are available in the ESRC data archive. But the data sets at the archive do not identify PSUs because of concern about the confidentiality of respondents. A data set was made available in SAS from the Scottish Executive (SHS team) with PSUs added. To prevent identification of individuals we have modified the data and taken other precautions as described here. The data set includes only the variables needed for the analyses illustrated here..
Some data problems were identified, because some clusters were not nested in strata, and some strata had only one PSU. To overcome these problems data corrections were carried out in SAS.
The program to extract and correct the data this was written in SAS and is: ex2_prep.sas It produces a SAS data set ex2.sas7bdat, and a transport file to read into other systems (ex2.xpt).
To investate how each package handles data problems a further data set was produced of the data without correcting them. The program to produce this is: ex2_prep_nc.sas and it produces a SAS data sets (ex2_nc.sas7bdat )and a transport file to read into other systems (ex2_nc.xpt).
The following steps were used to read the SAS data sets into other packages :-
- - The code to read the files ex2.xpt and ex2_nc.xpt into R is in files exp2_prep.R and exp2_nc.R produces ex2.RData and ex2_nc.RData.
- - To read ex2.xpt iand ex2_nc.xpt into Stata ex2_prep.do and ex2_prep_nc.do were used and produced ex2.dta and ex2_nc.dta.
- - To read ex2.sas7bdat and ex2_nc.sas7bdat into SPSS the code ex2_prep.SPS , ex2_prep_nc.SPS was used and produce ex2.sav and ex2_nc.sav.
The data sets can all be accessed from the main page of exemplar 2.