You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
285 lines
10 KiB
Plaintext
285 lines
10 KiB
Plaintext
{smcl}
|
|
{* 28nov2005}{...}
|
|
{hline}
|
|
help for {hi:micombine}{right:(SJ5-4: st0067_2; SJ5-2: st0067_1; SJ4-3: st0067)}
|
|
{hline}
|
|
|
|
{title:Estimation of regression models with multiply imputed samples}
|
|
|
|
|
|
{p 8 18 2}
|
|
{cmd:micombine}
|
|
{{it:supported_regression_cmd} | {it:other_regression_cmd}}
|
|
[{it:yvar}]
|
|
[{it:covarlist}]
|
|
[{it:other_stuff]}
|
|
{ifin}
|
|
{weight}
|
|
[{cmd:,}
|
|
{cmd:br}
|
|
{cmdab:nocons:tant}
|
|
{cmdab:det:ail}
|
|
{cmdab:ef:orm}[{cmd:(}{it:string}{cmd:)}]
|
|
{cmdab:g:enxb(}{it:newvarname}{cmd:)}
|
|
{cmdab:imp:id(}{it:varname}{cmd:)}
|
|
{cmd:lrr}
|
|
{cmdab:nowar:ning}
|
|
{cmdab:obs:id(}{it:varname}{cmd:)}
|
|
{it:regression_cmd_options}]
|
|
|
|
{p 4 4 2}
|
|
where
|
|
|
|
{p 8 8 2}
|
|
{it:supported_regression_cmd}s are
|
|
{helpb clogit},
|
|
{helpb cnreg},
|
|
{helpb glm},
|
|
{helpb logistic},
|
|
{helpb logit},
|
|
{helpb mlogit},
|
|
{helpb ologit},
|
|
{helpb oprobit},
|
|
{helpb poisson},
|
|
{helpb probit},
|
|
{helpb qreg},
|
|
{helpb regress},
|
|
{helpb rreg},
|
|
{helpb stcox},
|
|
{helpb streg},
|
|
or
|
|
{helpb xtgee}, and {it:other_regression_cmd} is any other Stata regression command
|
|
(see Remarks).
|
|
|
|
{p 4 4 2}
|
|
{cmd:micombine} shares a subset of the features of all {help estcom:estimation commands};
|
|
see {it:Remarks}.
|
|
|
|
{p 4 4 2}
|
|
All weight types supported by {it:regression_cmd} are allowed; see
|
|
{help weight}.
|
|
|
|
|
|
{title:Description}
|
|
|
|
{p 4 4 2}
|
|
{cmd:micombine} estimates the parameters of a regression model whose
|
|
type is determined by {it:supported_regression_cmd} or {it:other_regression_cmd}.
|
|
Parameter estimates are combined
|
|
across several replicates obtained previously by multiple imputation,
|
|
e.g. by using {helpb ice} to create a file of imputed data.
|
|
See {it:Remarks} for a brief account of how {cmd:micombine} combines
|
|
the estimates and obtains standard errors.
|
|
|
|
|
|
{title:Options}
|
|
|
|
{p 4 8 2}
|
|
{cmd:br} calculates degrees of freedom and tests of significance for each predictor
|
|
according to the formulae (3)-(5) of Barnard & Rubin (1999).
|
|
After estimation, the required degrees of freedom are stored in a matrix
|
|
(column vector) {cmd:e(nutilde)}. Note that if {cmd:test}
|
|
is used after {cmd:micombine} for significance testing of regression
|
|
coefficients, such tests assume that the degrees of freedom are
|
|
equal to the number of observations minus the number of parameters
|
|
estimated, not those given in {cmd:e(nutilde)}.
|
|
|
|
{p 4 8 2}
|
|
{cmd:noconstant} suppresses the regression constant in all regressions.
|
|
|
|
{p 4 8 2}
|
|
{cmd:detail} gives details of the regression model for each imputation.
|
|
|
|
{p 4 8 2}
|
|
{cmd:eform}[{cmd:(}{it:string}{cmd:)}] specifies that the exponentiated form
|
|
of the coefficients be output and that the constant not be reported.
|
|
The exponentiated coefficients are labeled {cmd:exp(b)}, unless
|
|
the optional {it:string} is used.
|
|
|
|
{p 4 8 2}
|
|
{cmd:genxb(}{it:newvarname}{cmd:)} creates {it:newvarname} to hold the
|
|
linear predictor from each regression model, averaged over all the
|
|
imputations.
|
|
|
|
{p 4 8 2}
|
|
{cmd:impid(}{it:varname}{cmd:)} specifies that {it:varname} is the variable
|
|
identifying the imputations. The number of imputations is determined as
|
|
the number of unique values of {it:varname}. All observations for which
|
|
{it:varname} takes the value zero are ignored in the analysis.
|
|
The default {it:varname} is {cmd:_j}.
|
|
|
|
{p 4 8 2}
|
|
{cmd:lrr} specifies that the Li-Raghunathan-Rubin (LRR) robust estimate of the
|
|
variance-covariance matrix of the regression coefficients be used.
|
|
|
|
{p 4 8 2}
|
|
{cmd:nowarning} suppresses the warning message about the use
|
|
of {it:other_regression_cmd}s (see {it:Remarks}).
|
|
|
|
{p 4 8 2}
|
|
{cmd:obsid(}{it:varname}{cmd:)} is provided to allow {cmd:micombine} to analyze
|
|
datasets created by programs other than {cmd:ice}. {it:varname} specifies the name
|
|
of a variable holding the "observation ID", i.e. the sequence number of each
|
|
observation in a given imputation. The number of observations should
|
|
be identical between imputations, as should the order of the observations.
|
|
{it:varname} should run 1,...,N for imputation 1, 1,...,N for imputation 2, and
|
|
so on. {cmd:ice} automatically stores the information with the data, so this
|
|
option is not required. The default {it:varname} is {cmd:_i}.
|
|
|
|
{p 4 8 2}
|
|
{it:regression_cmd_options} may be any of the options appropriate to
|
|
{it:regression_cmd}.
|
|
|
|
|
|
{title:Remarks}
|
|
|
|
{p 4 4 2}
|
|
Details of statistical inference from multiple imputed datasets are nicely
|
|
described in a recent Stata Journal article by John Carlin and colleagues
|
|
(Carlin et al. 2003). Here, with due acknowledgment to John, I give an edited
|
|
version of section 2 of his article.
|
|
|
|
{p 4 4 2}
|
|
A simple method of combining estimates from several models was derived by
|
|
Rubin (1987). Suppose initially that primary interest lies in estimating a
|
|
scalar quantity, Q. Here, Q is a regression coefficient, for example, the log
|
|
hazard ratio in a proportional hazards model. Suppose that we have imputed m
|
|
complete datasets using an appropriate model. In each dataset, standard
|
|
complete-data methods are used to obtain an estimate of Q with an associated
|
|
standard error. Let Q(k) and U(k) denote the point estimate and variance
|
|
respectively from the kth (k = 1, 2, ... , m) dataset. The point estimate Q^
|
|
of Q from multiple imputation is simply the arithmetic mean of Q(1),...,Q(k).
|
|
|
|
{p 4 4 2}
|
|
Obtaining a valid standard error for this estimate of Q^ requires combining
|
|
information on within-imputation and between-imputation variation. The latter
|
|
is important in reflecting uncertainty due to variability between imputation
|
|
samples. First, a within-imputation variance component, W, is obtained as the
|
|
mean of the complete-data variance estimates, Q(1),....,Q(k). Second, a
|
|
between-imputation variance component, B, is calculated as the sum of squares
|
|
of Q(1),....,Q(k) about Q^, divided by m-1. The (total) variance T of Q^ is
|
|
given by
|
|
|
|
{p 8 12 2}
|
|
T = W + B * (1 + 1/m)
|
|
|
|
{p 4 4 2}
|
|
Rubin (1987) showed that (Q - Q^)/sqrt(T) is distributed approximately
|
|
as Student's t on nu degrees of freedom, where
|
|
|
|
{p 8 12 2}
|
|
nu = (m - 1) * (1 + W /(B * (1 + 1/m)))^2
|
|
|
|
{p 4 4 2}
|
|
The (1 + 1/m) term in these expressions indicates that it is not necessary to
|
|
a create large number of imputed datasets, particularly when B is much smaller
|
|
than W. The condition will be satisfied unless there is much missing data and
|
|
the parameter estimates within each dataset are very precise.
|
|
|
|
{title:Available regression commands}
|
|
|
|
{p 4 4 2}
|
|
{cmd:micombine} has been tested with the commands listed under
|
|
{it:supported_regression_cmd} at the beginning of this help file.
|
|
{cmd:micombine} {it:may} work satisfactorily with {it:other_regression_cmd}s,
|
|
but this cannot be guaranteed. This facility is provided so that the
|
|
researcher familiar with a particular Stata command has a fighting chance of
|
|
obtaining correct MI estimates and standard errors. HOWEVER, THE AUTHOR
|
|
DISCLAIMS ALL RESPONSIBILITY FOR THE CORRECTNESS OF RESULTS ARISING FROM USE
|
|
OF AN {it:other_regression_cmd}. Note that {it:other_stuff} in the syntax
|
|
diagram is code that may be required by some {it:other_regression_cmd}s, for
|
|
example {cmd:ivreg} wants {cmd:(}{it:varlist2}{cmd: = }{it:varlist_iv}{cmd:)}.
|
|
{cmd:micombine} parses for the occurrence of an opening parenthesis. There may
|
|
be other syntaxes that are not accommodated by this approach; if so, please
|
|
contact the author with details.
|
|
|
|
{title:Postestimation prediction}
|
|
|
|
{p 4 4 2}
|
|
The {cmd:predict} command {it:may} work as you expect after {cmd:micombine},
|
|
but this feature should be regarded as under development and should be
|
|
treated with caution. {cmd:micombine} stores the quantities needed by
|
|
{cmd:predict} at the last execution of the regression command, that is at the
|
|
final imputation, but prediction following some regression commands has
|
|
non-standard features that are hard to emulate accurately.
|
|
Known issues are as follows:
|
|
|
|
{p 8 12 2}
|
|
1. After {cmd:micombine mlogit}: {cmd:predict} may require that the outcome
|
|
levels are known as 0, 1, 2, ... , so it may be necessary to drop the
|
|
score label for the outcome variable, if such a label is defined.
|
|
This is KNOWN to be a problem using {cmd:mfx} following {cmd:micombine mlogit}.
|
|
For example, {cmd:mfx compute, predict(outcome(0))} will work only if
|
|
the lowest level of the outcome is 0, and is not labeled.
|
|
|
|
{p 8 12 2}
|
|
2. After {cmd:micombine} with a restricted sample (i.e. using {cmd:if},
|
|
{cmd:in} or zero weights for some observations, or some members
|
|
of {it:covarlist} still have missing values), the system variable
|
|
{cmd:e(sample)} is defined as you would expect it to be
|
|
only for the final imputation. In all earlier imputations it
|
|
is zero. Although not necessarily convenient for use of
|
|
{cmd:e(sample)} in data analysis, the behavior is correct for the
|
|
purposes of {cmd:predict}, since the relevant sample size and
|
|
estimation sample are properties of (any) one imputation,
|
|
but not of the complete assembly of imputations.
|
|
|
|
|
|
{title:Examples}
|
|
|
|
{p 4 8 2}{cmd:. ice y x1 x2 x3 using imp, m(10) genmiss(m_)}{p_end}
|
|
{p 4 8 2}{cmd:. use imp, clear}{p_end}
|
|
{p 4 8 2}{cmd:. micombine regress y x1 x2 x3}{p_end}
|
|
{p 4 8 2}{cmd:. stset time, failure(cens)}{p_end}
|
|
{p 4 8 2}{cmd:. micombine stcox x1 x2 x3, genxb(index)}{p_end}
|
|
{p 4 8 2}{cmd:. test x2==1}{p_end}
|
|
{p 4 8 2}{cmd:. testparm x1 x2}{p_end}
|
|
|
|
|
|
{title:Author}
|
|
|
|
{p 4 4 2}
|
|
Patrick Royston, MRC Clinical Trials Unit, London.
|
|
patrick.royston@ctu.mrc.ac.uk
|
|
|
|
|
|
{title:References}
|
|
|
|
{p 4 8 2}
|
|
Barnard, J. and D. B. Rubin. 1999. Small-sample degrees of freedom with
|
|
multiple imputation. {it:Biometrika} 86: 948-955.
|
|
|
|
{p 4 8 2}
|
|
Carlin, J. B., N. Li, P. Greenwood, and C. Coffey. 2003.
|
|
Tools for analyzing multiple imputed datasets. {it:Stata Journal} 3(3): 226-244.
|
|
|
|
{p 4 8 2}
|
|
Carlin, J. B., N. Li, P. Greenwood, and C. Coffey. 2003.
|
|
Tools for analyzing multiple imputed datasets. {it:Stata Journal} 3(3): 226-244.
|
|
|
|
{p 4 8 2}
|
|
Li, K., T. Raghunathan, and D. Rubin. 1991. Large sample significance levels
|
|
from multiply-imputed data using moment-based statistics and an F reference
|
|
distribution. {it:Journal of the American Statistical Association} 86: 1065-1073.
|
|
|
|
{p 4 8 2}
|
|
Rubin, D. 1987. {it:Multiple Imputation for Nonresponse in Surveys}. New York:
|
|
Wiley.
|
|
|
|
{p 4 8 2}
|
|
Schafer, J. 1997. {it:Analysis of Incomplete Multivariate Data}. London:
|
|
Chapman & Hall.
|
|
|
|
{p 4 8 2}
|
|
van Buuren, S., H. C. Boshuizen and D. L. Knook. 1999. Multiple imputation of
|
|
missing blood pressure covariates in survival analysis.
|
|
{it:Statistics in Medicine} 18: 681-694.
|
|
(Also see http://www.multiple-imputation.com.)
|
|
|
|
|
|
{title:Also see}
|
|
|
|
{psee}
|
|
Online: {helpb ice}
|
|
{p_end}
|