Stata OUTPUT FOR EXEMPLAR 2


Stata commands in white, output in green and yellow, warnings are in red
Comments on interpretation of output are in blue.

For comments on running the analyses go to the commented code file.


Links within this page

Simple means and proportions Table 2.3
Proportions for subgroups
Effect of design aspects on precision of estimates table 2.4
Chi square tests tables 2.6 onwards
Internet use by council area
Survey logistic regressions
. svyset [pwei=ind_wt],psu(psu) strata(stratum)
pweight is ind_wt
strata is stratum
psu is psu
. svyprop intuse                                          back to top
---------------------------------------------------------------------
pweight:  ind_wt                     Number of obs       =     28685
Strata:   stratum                    Number of strata    =       281
PSU:      psu                        Number of PSUs      =     11937
                                     Population size     = 28642.333
---------------------------------------------------------------------

Here sum of weights add approximately to sample size, since weights
are not grossing up weights. Hence population total misleading, but
results are OK.
Survey proportions estimation

  +-------------------------------------------+
  | intuse     Obs     Est. Prop.  Std. Err.  |
  |-------------------------------------------|
  |      0   19823     0.658436    0.003405   |
  |      1    8862     0.341564    0.003405   |
  +-------------------------------------------+

. svymean intuse , ci deff
To get design effects, need to get mean of the 0/1 vraiable
Survey mean estimation

pweight:  ind_wt                        Number of obs    =     28685
Strata:   stratum                       Number of strata =       281
PSU:      psu                           Number of PSUs   =     11937
                                        Population size  = 28642.333

----------------------------------------------------------------------
    Mean |   Estimate       Std. Err.   [95% Conf. Interval]     Deff
-------+------------------------------------------------------------
  intuse  |   .3415642    .0034046    .3348905    .3482378    1.478395
----------------------------------------------------------------------

. svymean intuse, deff deft  by(sex)                     back to top

 Survey mean estimation

pweight:  ind_wt                           Number of obs   =     28685
Strata:   stratum                         Number of strata =       281
PSU:      psu                             Number of PSUs   =     11937
                                          Population size  = 28642.333

-----------------------------------------------------------------
Mean   Subpop. |   Estimate    Std. Err.       Deff        Deft
---------------+-------------------------------------------------
intuse         |
          male |   .3851593    .0051239    1.405104    1.185371
        female |   .3070535    .0043296    1.410531    1.187658
-----------------------------------------------------------------

. svymean rc5
stratum with only one PSU detected
This failure happens because restricting to internet users only gives
at least one  stratum with a lonely PSU when a stratum has only one PSU is is not possible to estimate variability within it and this can cause programs to fail. Next command shows which ones. It would
be tiresome to sort them all out.

svydes if intuse==1
pweight:  ind_wt
Strata:   stratum
PSU:      psu
                                      #Obs per PSU
 Strata                       ----------------------------
 stratum    #PSUs     #Obs       min      mean       max
--------  --------  --------  --------  --------  --------
    100A       110       110         1       1.0         1
    100B        60        60         1       1.0         1
Lines missed out here
    180E        10        10         1       1.0         1
    180F        18        18         1       1.0         1
    180G        17        17         1       1.0         1
    180H         1*        1         1       1.0         1
    180I        63        63         1       1.0         1
	A * indicates a lonely PSU 

    More lines missed here
	A few more lonely PSUs could be seen in them.

. svyset [pweight=ind_wt],  clear( strata psu pweight )    back to top
pweight is ind_wt

. svymean intuse , ci deff

Survey mean estimation

pweight:  ind_wt                             Number of obs    =     28685
Strata:   <one>                              Number of strata =         1
PSU:      <observations>                Number of PSUs   =     28685
                                        Population size  = 28642.333

------------------------------------------------------------------------
    Mean |   Estimate    Std. Err.      [95% Conf. Interval]     Deff
---------+--------------------------------------------------------------
  intuse  |   .3415642    .0032671    .3351606    .3479678    1.361344
------------------------------------------------------------------------

. svyset ,psu(psu)
pweight is ind_wt
psu is psu

. svymean intuse , ci deff

Survey mean estimation

pweight:  ind_wt                         Number of obs    =     28685
Strata:   <one>                          Number of strata =         1
PSU:      psu                            Number of PSUs   =     11937
                                         Population size  = 28642.333

----------------------------------------------------------------------
    Mean |   Estimate    Std. Err.     [95% Conf. Interval]     Deff
---------+------------------------------------------------------------
  intuse |   .3415642    .0037742     .334166    .3489623    1.816821
----------------------------------------------------------------------

. svyset, clear(psu) strata(stratum)
pweight is ind_wt
strata is stratum

. svymean intuse , ci deff

Survey mean estimation

pweight:  ind_wt                                  Number of obs    =     28685
Strata:   stratum                                 Number of strata =       281
PSU:      <observations>                          Number of PSUs   =     28685
                                                  Population size  = 28642.333

---------------------------------------------------------------------
    Mean |   Estimate    Std. Err.          nf. Interval]        Deff
---------+-----------------------------------------------------------
  intuse |   .3415642    .0031608    .3353689    .3477594    1.274203
----------------------------------------------------------------------

. svyset,psu(psu)
pweight is ind_wt
strata is stratum
psu is psu
This restores the actual correct design

. svytab intuse sex, count row percent                back to top

pweight:  ind_wt                            Number of obs      =     28685
Strata:   stratum                           Number of strata   =       281
PSU:      psu                               Number of PSUs     =     11937
                                            Population size    = 28642.333

-------------------------------------
Whether   |
person    |
uses the  |
--interne |            sex
t         |    male   female    Total
----------+--------------------------
        0 |    7781  1.1e+04  1.9e+04
          |   41.26    58.74      100
          |
        1 |    4874     4909     9783
          |   49.82    50.18      100
          |
    Total | 1.3e+04  1.6e+04  2.9e+04
          |   44.18    55.82      100
-------------------------------------
  Key:  weighted counts
        row percentages

  Pearson:
    Uncorrected   chi2(1)         =  191.8936
    Design-based  F(1, 11656)     =  143.8122     P = 0.0000
The counts in this table are the weighted counts, i.e. the sum of the
weights in each cell. This is not very helpful especially since we don't
have grossing up weights here. Actual base numbers in each row would be
much more useful. The format for counts is also not good, but could
easily be fixed with a format subcommand (see table below).

. svytab  intuse rc5,
stratum with only one PSU detected

. svytab sex groc, count row percent format(%10.2f)

pweight:  ind_wt                       Number of obs      =     28685
Strata:   stratum                      Number of strata   =       281
PSU:      psu                          Number of PSUs     =     11937
                                       Population size    = 28642.333

----------------------------------------
          |             groc
      sex |        0         1     Total
----------+-----------------------------
     male | 12310.71    344.77  12655.48
          |    97.28      2.72    100.00
          |
   female | 15513.40    473.45  15986.86
          |    97.04      2.96    100.00
          |
    Total | 27824.11    818.22  28642.33
          |    97.14      2.86    100.00
----------------------------------------
  Key:  weighted counts
        row percentages

  Pearson:
    Uncorrected   chi2(1)         =    1.4349
    Design-based  F(1, 11656)     =    1.0726     P = 0.3004

 .  svymean intuse,  ci deff by(council)              back to top
Survey mean estimation

pweight:  ind_wt                              Number of obs    =     28685
Strata:   stratum                             Number of strata =       281
PSU:      psu                                 Number of PSUs   =     11937
                                              Population size  = 28642.333

---------------------------------------------------------------------------
Mean   Subpop. |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
---------------+-----------------------------------------------------------
intuse         |
      Aberdeen |   .4259058     .015494    .3955349    .4562766    1.285795
      Clackman |    .300511    .0301624    .2413875    .3596344    1.158457
      Orkney_I |   .3083178    .0301204    .2492768    .3673588    .4625155
      PerthKin |   .3762066    .0225993    .3319081     .420505    1.634168
      Renfrews |   .3161091    .0174258    .2819515    .3502666    1.417949

only a few rows shown
------------------------------------------------------------------------------

. svymean intuse,  ci deff by(council) srssubpop
This gives design effects with respecyt to simple random sampling in the subgroups
Survey mean estimation

pweight:  ind_wt                                  Number of obs    =     28685
Strata:   stratum                                 Number of strata =       281
PSU:      psu                                     Number of PSUs   =     11937
                                                  Population size  = 28642.333

------------------------------------------------------------------------------
Mean   Subpop. |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
---------------+--------------------------------------------------------------
intuse         |
      Aberdeen |   .4259058     .015494    .3955349    .4562766    1.141851
      Clackman |    .300511    .0301624    .2413875    .3596344    2.246256
      Orkney_I |   .3083178    .0301204    .2492768    .3673588    2.612066
      PerthKin |   .3762066    .0225993    .3319081     .420505    1.466841
      Renfrews |   .3161091    .0174258    .2819515    .3502666    1.261359
only a few rows shown
------------------------------------------------------------------------------
 . tabulate groupinc, generate(groupinc)                           back to top
getting dummy variables for regression

   groupinc |      Freq.     Percent        Cum.
------------+-----------------------------------
    missing |        756        2.64        2.64
  under 10K |      8,840       30.82       33.45
     10-20K |     10,206       35.58       69.03
     20-30k |      5,472       19.08       88.11
     30-50k |      2,876       10.03       98.13
       50K+ |        535        1.87      100.00
------------+-----------------------------------
      Total |       28,685      100.00

. svylogit intuse  groupinc3-groupinc6 if groupinc>0,prob deff deft or 
Survey logistic regression

pweight:  ind_wt                           Number of obs    =     28685
Strata:   stratum                          Number of strata =       281
PSU:      psu                              Number of PSUs   =     11937
                                           Population size  = 28642.333
                                           F(   4,  11653)  =    686.78
                                           Prob > F         =    0.0000

------------------------------------------------------------------------------
      intuse | Odds Ratio   Std. Err.      t    P>|t|         Deff       Deft
-------------+----------------------------------------------------------------
   groupinc3 |     2.0628   .0916902    16.29   0.000     1.245006   1.115799
   groupinc4 |   5.224951   .2463328    35.07   0.000     1.322352   1.149936
   groupinc5 |   13.08222   .7537283    44.63   0.000     1.420008   1.191641
   groupinc6 |   22.63186   2.864421    24.65   0.000     1.587204   1.259843
------------------------------------------------------------------------------


. logistic intuse  groupinc3-groupinc6 if groupinc>0, or
This is simple unweighted regression
Logistic regression                               Number of obs   =      28685
                                                  LR chi2(4)      =    4898.43
                                                  Prob > chi2     =     0.0000
Log likelihood = -15285.327                       Pseudo R2       =     0.1381

------------------------------------------------------------------------------
      intuse | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   groupinc3 |   2.549186   .0980338    24.33   0.000     2.364106    2.748756
   groupinc4 |    6.627238   .2735814    45.81   0.000     6.112148    7.185737
   groupinc5 |    15.67939   .7972235    54.13   0.000      14.1922    17.32243
   groupinc6 |     26.5441   2.922164    29.78   0.000     21.39251    32.93626
------------------------------------------------------------------------------

. tabulate shs_6cla, generate (rural)

           Urban/rural classification |      Freq.     Percent        Cum.
--------------------------------------+-----------------------------------
Urban settlements of over 125,000 pop |     10,298       35.96       35.96
                        Other urban   |      8,352       29.16       65.12
             Small access towns,3-10k |      2,937       10.26       75.38
        Small remote towns, pop 3-10k |      1,290        4.50       79.88
   Accessible rural, pop<3k, drive>0  |      3,264       11.40       91.28
       Remote rural, pop<3k, drive>30 |      2,498        8.72      100.00
--------------------------------------+-----------------------------------
                                Total |     28,639      100.00

. svylogit intuse groupinc3-groupinc6 rural2-rural6 if groupinc>0,prob or

Survey logistic regression

pweight:  ind_wt                                  Number of obs    =     28639
Strata:   stratum                                 Number of strata =       281
PSU:      psu                                     Number of PSUs   =     11922
                                                  Population size  = 28593.511
                                                  F(   9,  11633)  =    306.37
                                                  Prob > F         =    0.0000

------------------------------------------------------------------------------
      intuse | Odds Ratio   Std. Err.      t    P>|t|
-------------+----------------------------------------------------------------
   groupinc3 |    2.08292   .0927013    16.49   0.000
   groupinc4 |   5.308005   .2510165    35.30   0.000
   groupinc5 |   13.30205   .7706116    44.67   0.000 
   groupinc6 |   22.71765   2.889008    24.56   0.000  
      rural2 |   .8277699   .0339711    -4.61   0.000  
      rural3 |   .9343237   .0535642    -1.18   0.236  
      rural4 |   .9006713   .0927712    -1.02   0.310  
      rural5 |   .8625718   .0452299    -2.82   0.005  
      rural6 |   1.017198   .0685998     0.25   0.800  
------------------------------------------------------------------------------

.  bspline,x(age)  power(3) gen(bs)
unrecognized command:  bspline
You will get this error if you have not installed the spline package from
the internet via the help.
.  bspline,x(age)  power(3) gen(bs)
(1 missing value generated)
This approach could be extended to produce m ore complicated fits
like the one shown on the exemplar main page. WE show only a simple
spline fit here.
. svymlogit intuse bs1-bs4,noconst deft deff

Survey multinomial logistic regression

pweight:  ind_wt                                  Number of obs    =     28684
Strata:   stratum                                 Number of strata =       281
PSU:      psu                                     Number of PSUs   =     11937
                                                  Population size  = 28641.023
                                                  F(   4,  11653)  =    748.87
                                                  Prob > F         =    0.0000

------------------------------------------------------------------------------
      intuse |      Coef.   Std. Err.      Deff       Deft
-------------+----------------------------------------------------------------
         bs1 |     3.267213   1.556268   1.487014   1.219432
         bs2 |    -.5498629     .41734   1.361043   1.166638
         bs3 |     .6622839    .447941   1.317879   1.147989
         bs4 |    -26.89027   2.000127   1.239484   1.113321
------------------------------------------------------------------------------
(Outcome intuse==0 is the comparison group)

. predict pr0 pr1
(option p assumed; predicted probabilities)
(1 missing value generated)

. plot pr1 age

  .57159 +
         | *
         |  ******
         |        ********
    P    |                ******
    r    |                      *****
    (    |                           ***
    i    |                              ***
    n    |                                 ***
    t    |                                    **
    u    |                                      **
    s    |                                        **
    e    |                                          **
    =    |                                            **
    =    |                                              **
    1    |                                                *
    )    |                                                 **
         |                                                   ***
         |                                                      **
         |                                                        ****
   .0158 +                                                            ******
          +----------------------------------------------------------------+
               16                       age                             80