HTML version of Stata do file can be viewed here
Links in this page Mean income with different design assumptions Subgroup lone parents Raking to match Scottish totals Jacknife estimation for mean Stata output is in green Commands are in white Comments are in blue
> first get the simple mean and confidence interval of > the income (unweighted) > ----------------------------------------------------------*/ . . ci hhinc Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- hhinc | 4695 470.8643 6.022048 459.0583 482.6704 . back to top . . svyset [pwei=gross2],psu(psu) pweight is gross2 psu is psu . svydes pweight: gross2 Strata:PSU: psu #Obs per PSU Strata ---------------------------- #PSUs #Obs min mean max -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 . . . svymean hhinc Survey mean estimation pweight: gross2 Number of obs = 4695 Strata: Number of strata = 1 PSU: psu Number of PSUs = 320 Population size = 2236979 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 483.0913 10.63943 462.159 504.0236 2.900452 ------------------------------------------------------------------------------ . . svyset, clear(all) no variables are set . svyset [pwei=gross2] pweight is gross2 . svydes pweight: gross2 Strata: PSU: #Obs per PSU Strata ---------------------------- #PSUs #Obs min mean max -------- -------- -------- -------- -------- -------- 1 4695 4695 1 1.0 1 -------- -------- -------- -------- -------- -------- 1 4695 4695 1 1.0 1 . svymean hhinc Survey mean estimation pweight: gross2 Number of obs = 4695 Strata: Number of strata = 1 PSU: Number of PSUs = 4695 Population size = 2236979 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 483.0913 7.877496 467.6477 498.5349 1.590031 ------------------------------------------------------------------------------ . . /*--------------------------------------- > now testing the wrong kind of weights > see Stata's help for weights for a good explanation > ------------------------------------------------------*/ . ci hhinc [aweight=gross2] Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- hhinc | 4695 483.0913 6.2472 470.8438 495.3387 . ci hhinc [fweight=gross2] Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- hhinc | 2236979 483.0913 .2861713 482.5304 483.6522 . /*------------------------------------------------------------ > Now clustering but no weighting > > WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT > CLUSTERING OR gross2ING? > ----------------------------------------------------------------*/ . . svyset, clear(all) no variables are set . svyset ,psu(psu) psu is psu . svydes pweight: Strata: PSU: psu #Obs per PSU Strata ---------------------------- #PSUs #Obs min mean max -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 . svymean hhinc Survey mean estimation pweight: Number of obs = 4695 Strata: Number of strata = 1 PSU: psu Number of PSUs = 320 Population size = 4695 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 470.8643 8.907628 453.3392 488.3894 2.187942 ------------------------------------------------------------------------------ back to top . /*------------------------------------------- > Now subgroups > > First for the survey data with weighting only > and looking at the subgroup of single parents ---------------------------------------------------------------*/ . gen lonep=(adulth==1 & depchldh>0) . svyset[pweight=gross2] , clear(strata psu ) pweight is gross2 . svymean hhinc if lonep == 1 , available Survey mean estimation pweight: gross2 Number of obs = 334 Strata: Number of strata = 1 PSU: Number of PSUs = 334 Population size = 131801 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 276.5555 8.223924 260.3781 292.7329 .94421 ------------------------------------------------------------------------------ . /*-------------------------------------------------------- > now add clustering > ---------------------------------------------------*/ . svyset, psu(psu) clear(strata) pweight is gross2 psu is psu . svymean hhinc if lonep==1, available Survey mean estimation pweight: gross2 Number of obs = 334 Strata: Number of strata = 1 PSU: psu Number of PSUs = 196 Population size = 131801 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 276.5555 8.503837 259.7842 293.3268 1.009579 ------------------------------------------------------------------------------ . . /*------------------------------------------------- > This survey was weighted and clustered and > poststratified at the UK level. To poststratify > at the Scotland level we need to get the additioanl data > > > Go to the data window in Stata and check how > the variable CTBAND relates to the PSUs > > Now define the survey as though it were STRATIFIED > by council tax band > > Look at how many PSUs you have now > ---------------------------------------------------*/ . svyset [pwei=gross2],psu(psu) strata(ctband) pweight is gross2 strata is ctband psu is psu . svymean hhinc Survey mean estimation pweight: gross2 Number of obs = 4695 Strata: ctband Number of strata = 9 PSU: psu Number of PSUs = 1572 Population size = 2236979 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 483.0913 8.193814 467.0193 499.1633 1.720289 ------------------------------------------------------------------------------ . /*------------------------------------------------ > You should find that Stata has made lots of more > PSUs by dividing each PSU into subgroups according > to CTBAND. back to top > > Of course this is wrong, because of the post-stratification > we need to use the SVR library ofr Stata that uses > replication methods. > ------------------------------------------------------------*/ . . /*--------------------------------------------------- > First step is to install the svr library > ---------------------------------------------------------------------*/ . ssc install svr checking svr consistency and verifying not already installed... all files already exist and are up-to-date. . clear . set matsize 800 Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.733M set memory 10M max. data space 10.000M set matsize 800 max. RHS vars in models 4.950M ----------- 16.682M . set memory 500M Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.733M set memory 500M max. data space 500.000M set matsize 800 max. RHS vars in models 4.950M ----------- 506.682M . /*----------------------------------------------------------------------------------------------- > the next bit of code adds the Scotland totals by council tax band and > tenure type so they can be used to rake the sample to match both of these > --------------------------------------------------------------------------------------------------* . replace cttot= 24.83 if ctband==1 (1023 real changes made) . replace cttot= 24.62 if ctband==2 (1139 real changes made) . replace cttot= 15.45 if ctband==3 (801 real changes made) . replace cttot=11.91 if ctband==4 (679 real changes made) . replace cttot=11.96 if ctband==5 (528 real changes made) . replace cttot=5.95 if ctband==6 (288 real changes made) . replace cttot=3.94 if ctband==7 (179 real changes made) . replace cttot=0.45 if ctband==8 (18 real changes made) . replace cttot=0.89 if ctband==9 (40 real changes made) . gen tentot=0 . replace tentot= 62.63 if tenure==1 (3008 real changes made) . replace tentot= 21.59 if tenure==2 (1105 real changes made) . replace tentot= 5.58 if tenure==3 (284 real changes made) . back to top . /*------------------- ------------------------------------------------------------------ > and make a set of jacknife weights for this survey > This next command will create 320 new variables (one for each replicate) where one > of the 320 PSUs is dropped from each replication. Look at the data to check this > -------------------------------------------------------------------------------*/ . survwgt create jk1, psu(psu) weight(gross2) Generating replicate weights......................................................................... > ................................................................................................... > ................................................................................................... > ................................................. Created weights and set svr values: meth jk1 pw gross2 rw jk1_1 jk1_2 jk1_3 jk1_4 jk1_5 jk1_6 jk1_7 jk1_8 jk1_9 jk1_10 jk1_11 jk1_12 jk1_13 jk1_14 jk1_15 jk1_16 jk1_17 jk1_18 jk1_19 jk1_20 jk1_21 jk1_22 jk1_23 jk1_24 jk1_25 jk1_26 jk1_27 jk1_28 jk1_29 jk1_30 jk1_31 jk1_32 jk1_33 jk1_34 jk1_35 jk1_36 jk1_37 jk1_38 jk1_39 jk1_40 jk1_41 jk1_42 jk1_43 jk1_44 jk1_45 jk1_46 jk1_47 jk1_48 jk1_49 jk1_50 jk1_51 jk1_52 jk1_53 jk1_54 jk1_55 jk1_56 jk1_57 jk1_58 jk1_59 jk1_60 jk1_61 jk1_62 jk1_63 jk1_64 jk1_65 jk1_66 lines missed jk1_281 jk1_282 jk1_283 jk1_284 jk1_285 jk1_286 jk1_287 jk1_288 jk1_289 jk1_290 jk1_291 jk1_292 jk1_293 jk1_294 jk1_295 jk1_296 jk1_297 jk1_298 jk1_299 jk1_300 jk1_301 jk1_302 jk1_303 jk1_304 jk1_305 jk1_306 jk1_307 jk1_308 jk1_309 jk1_310 jk1_311 jk1_312 jk1_313 jk1_314 jk1_315 jk1_316 jk1_317 jk1_318 jk1_319 jk1_320 dof 319 fay 0 psun . /*---------------- now use the survey replication commands--------------------------------*/ . survwgt rake [all] , by(ctband tenure) totvars( cttot tentot) replace SVR settings updated: pw gross2 rw jk1_1 jk1_2 jk1_3 jk1_4 jk1_5 jk1_6 jk1_7 jk1_8 jk1_9 jk1_10 jk1_11 jk1_12 jk1_13 jk1_14 jk1_15 jk1_16 jk1_17 jk1_18 jk1_19 jk1_20 jk1_21 jk1_22 jk1_23 jk1_24 jk1_25 jk1_26 jk1_27 jk1_28 jk1_29 jk1_30 jk1_31 jk1_32 jk1_33 jk1_34 jk1_35 jk1_36 jk1_37 jk1_38 jk1_39 jk1_40 jk1_41 jk1_42 jk1_43 jk1_44 jk1_45 jk1_46 jk1_47 jk1_48 jk1_49 jk1_50 jk1_51 jk1_52 jk1_53 k1_271 jk1_272 jk1_273 jk1_274 jk1_275 jk1_276 jk1_277 jk1_278 jk1_279 jk1_280 jk1_281 jk1_282 jk1_283 jk1_284 jk1_285 jk1_286 jk1_287 jk1_288 jk1_289 jk1_290 jk1_291 jk1_292 jk1_293 jk1_294 jk1_295 jk1_296 jk1_297 jk1_298 jk1_299 jk1_300 jk1_301 jk1_302 jk1_303 jk1_304 jk1_305 jk1_306 jk1_307 jk1_308 jk1_309 jk1_310 jk1_311 jk1_312 jk1_313 jk1_314 jk1_315 jk1_316 jk1_317 jk1_318 jk1_319 jk1_320 . . svrmean hhinc Survey mean estimation, replication (jk1) variance method Analysis weight: gross2 Number of obs = 4695 Replicate weights: jk1_1... Population size = 2242012 Number of replicates: 320 Degrees of freedom = 319 ------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- hhinc | 475.9539 7.399009 461.3969 490.5109 1.422339 ------------------------------------------------------------------------------ . end of do-file