{smcl} {* 28nov2005}{...} {hline} help for {hi:micombine}{right:(SJ5-4: st0067_2; SJ5-2: st0067_1; SJ4-3: st0067)} {hline} {title:Estimation of regression models with multiply imputed samples} {p 8 18 2} {cmd:micombine} {{it:supported_regression_cmd} | {it:other_regression_cmd}} [{it:yvar}] [{it:covarlist}] [{it:other_stuff]} {ifin} {weight} [{cmd:,} {cmd:br} {cmdab:nocons:tant} {cmdab:det:ail} {cmdab:ef:orm}[{cmd:(}{it:string}{cmd:)}] {cmdab:g:enxb(}{it:newvarname}{cmd:)} {cmdab:imp:id(}{it:varname}{cmd:)} {cmd:lrr} {cmdab:nowar:ning} {cmdab:obs:id(}{it:varname}{cmd:)} {it:regression_cmd_options}] {p 4 4 2} where {p 8 8 2} {it:supported_regression_cmd}s are {helpb clogit}, {helpb cnreg}, {helpb glm}, {helpb logistic}, {helpb logit}, {helpb mlogit}, {helpb ologit}, {helpb oprobit}, {helpb poisson}, {helpb probit}, {helpb qreg}, {helpb regress}, {helpb rreg}, {helpb stcox}, {helpb streg}, or {helpb xtgee}, and {it:other_regression_cmd} is any other Stata regression command (see Remarks). {p 4 4 2} {cmd:micombine} shares a subset of the features of all {help estcom:estimation commands}; see {it:Remarks}. {p 4 4 2} All weight types supported by {it:regression_cmd} are allowed; see {help weight}. {title:Description} {p 4 4 2} {cmd:micombine} estimates the parameters of a regression model whose type is determined by {it:supported_regression_cmd} or {it:other_regression_cmd}. Parameter estimates are combined across several replicates obtained previously by multiple imputation, e.g. by using {helpb ice} to create a file of imputed data. See {it:Remarks} for a brief account of how {cmd:micombine} combines the estimates and obtains standard errors. {title:Options} {p 4 8 2} {cmd:br} calculates degrees of freedom and tests of significance for each predictor according to the formulae (3)-(5) of Barnard & Rubin (1999). After estimation, the required degrees of freedom are stored in a matrix (column vector) {cmd:e(nutilde)}. Note that if {cmd:test} is used after {cmd:micombine} for significance testing of regression coefficients, such tests assume that the degrees of freedom are equal to the number of observations minus the number of parameters estimated, not those given in {cmd:e(nutilde)}. {p 4 8 2} {cmd:noconstant} suppresses the regression constant in all regressions. {p 4 8 2} {cmd:detail} gives details of the regression model for each imputation. {p 4 8 2} {cmd:eform}[{cmd:(}{it:string}{cmd:)}] specifies that the exponentiated form of the coefficients be output and that the constant not be reported. The exponentiated coefficients are labeled {cmd:exp(b)}, unless the optional {it:string} is used. {p 4 8 2} {cmd:genxb(}{it:newvarname}{cmd:)} creates {it:newvarname} to hold the linear predictor from each regression model, averaged over all the imputations. {p 4 8 2} {cmd:impid(}{it:varname}{cmd:)} specifies that {it:varname} is the variable identifying the imputations. The number of imputations is determined as the number of unique values of {it:varname}. All observations for which {it:varname} takes the value zero are ignored in the analysis. The default {it:varname} is {cmd:_j}. {p 4 8 2} {cmd:lrr} specifies that the Li-Raghunathan-Rubin (LRR) robust estimate of the variance-covariance matrix of the regression coefficients be used. {p 4 8 2} {cmd:nowarning} suppresses the warning message about the use of {it:other_regression_cmd}s (see {it:Remarks}). {p 4 8 2} {cmd:obsid(}{it:varname}{cmd:)} is provided to allow {cmd:micombine} to analyze datasets created by programs other than {cmd:ice}. {it:varname} specifies the name of a variable holding the "observation ID", i.e. the sequence number of each observation in a given imputation. The number of observations should be identical between imputations, as should the order of the observations. {it:varname} should run 1,...,N for imputation 1, 1,...,N for imputation 2, and so on. {cmd:ice} automatically stores the information with the data, so this option is not required. The default {it:varname} is {cmd:_i}. {p 4 8 2} {it:regression_cmd_options} may be any of the options appropriate to {it:regression_cmd}. {title:Remarks} {p 4 4 2} Details of statistical inference from multiple imputed datasets are nicely described in a recent Stata Journal article by John Carlin and colleagues (Carlin et al. 2003). Here, with due acknowledgment to John, I give an edited version of section 2 of his article. {p 4 4 2} A simple method of combining estimates from several models was derived by Rubin (1987). Suppose initially that primary interest lies in estimating a scalar quantity, Q. Here, Q is a regression coefficient, for example, the log hazard ratio in a proportional hazards model. Suppose that we have imputed m complete datasets using an appropriate model. In each dataset, standard complete-data methods are used to obtain an estimate of Q with an associated standard error. Let Q(k) and U(k) denote the point estimate and variance respectively from the kth (k = 1, 2, ... , m) dataset. The point estimate Q^ of Q from multiple imputation is simply the arithmetic mean of Q(1),...,Q(k). {p 4 4 2} Obtaining a valid standard error for this estimate of Q^ requires combining information on within-imputation and between-imputation variation. The latter is important in reflecting uncertainty due to variability between imputation samples. First, a within-imputation variance component, W, is obtained as the mean of the complete-data variance estimates, Q(1),....,Q(k). Second, a between-imputation variance component, B, is calculated as the sum of squares of Q(1),....,Q(k) about Q^, divided by m-1. The (total) variance T of Q^ is given by {p 8 12 2} T = W + B * (1 + 1/m) {p 4 4 2} Rubin (1987) showed that (Q - Q^)/sqrt(T) is distributed approximately as Student's t on nu degrees of freedom, where {p 8 12 2} nu = (m - 1) * (1 + W /(B * (1 + 1/m)))^2 {p 4 4 2} The (1 + 1/m) term in these expressions indicates that it is not necessary to a create large number of imputed datasets, particularly when B is much smaller than W. The condition will be satisfied unless there is much missing data and the parameter estimates within each dataset are very precise. {title:Available regression commands} {p 4 4 2} {cmd:micombine} has been tested with the commands listed under {it:supported_regression_cmd} at the beginning of this help file. {cmd:micombine} {it:may} work satisfactorily with {it:other_regression_cmd}s, but this cannot be guaranteed. This facility is provided so that the researcher familiar with a particular Stata command has a fighting chance of obtaining correct MI estimates and standard errors. HOWEVER, THE AUTHOR DISCLAIMS ALL RESPONSIBILITY FOR THE CORRECTNESS OF RESULTS ARISING FROM USE OF AN {it:other_regression_cmd}. Note that {it:other_stuff} in the syntax diagram is code that may be required by some {it:other_regression_cmd}s, for example {cmd:ivreg} wants {cmd:(}{it:varlist2}{cmd: = }{it:varlist_iv}{cmd:)}. {cmd:micombine} parses for the occurrence of an opening parenthesis. There may be other syntaxes that are not accommodated by this approach; if so, please contact the author with details. {title:Postestimation prediction} {p 4 4 2} The {cmd:predict} command {it:may} work as you expect after {cmd:micombine}, but this feature should be regarded as under development and should be treated with caution. {cmd:micombine} stores the quantities needed by {cmd:predict} at the last execution of the regression command, that is at the final imputation, but prediction following some regression commands has non-standard features that are hard to emulate accurately. Known issues are as follows: {p 8 12 2} 1. After {cmd:micombine mlogit}: {cmd:predict} may require that the outcome levels are known as 0, 1, 2, ... , so it may be necessary to drop the score label for the outcome variable, if such a label is defined. This is KNOWN to be a problem using {cmd:mfx} following {cmd:micombine mlogit}. For example, {cmd:mfx compute, predict(outcome(0))} will work only if the lowest level of the outcome is 0, and is not labeled. {p 8 12 2} 2. After {cmd:micombine} with a restricted sample (i.e. using {cmd:if}, {cmd:in} or zero weights for some observations, or some members of {it:covarlist} still have missing values), the system variable {cmd:e(sample)} is defined as you would expect it to be only for the final imputation. In all earlier imputations it is zero. Although not necessarily convenient for use of {cmd:e(sample)} in data analysis, the behavior is correct for the purposes of {cmd:predict}, since the relevant sample size and estimation sample are properties of (any) one imputation, but not of the complete assembly of imputations. {title:Examples} {p 4 8 2}{cmd:. ice y x1 x2 x3 using imp, m(10) genmiss(m_)}{p_end} {p 4 8 2}{cmd:. use imp, clear}{p_end} {p 4 8 2}{cmd:. micombine regress y x1 x2 x3}{p_end} {p 4 8 2}{cmd:. stset time, failure(cens)}{p_end} {p 4 8 2}{cmd:. micombine stcox x1 x2 x3, genxb(index)}{p_end} {p 4 8 2}{cmd:. test x2==1}{p_end} {p 4 8 2}{cmd:. testparm x1 x2}{p_end} {title:Author} {p 4 4 2} Patrick Royston, MRC Clinical Trials Unit, London. patrick.royston@ctu.mrc.ac.uk {title:References} {p 4 8 2} Barnard, J. and D. B. Rubin. 1999. Small-sample degrees of freedom with multiple imputation. {it:Biometrika} 86: 948-955. {p 4 8 2} Carlin, J. B., N. Li, P. Greenwood, and C. Coffey. 2003. Tools for analyzing multiple imputed datasets. {it:Stata Journal} 3(3): 226-244. {p 4 8 2} Carlin, J. B., N. Li, P. Greenwood, and C. Coffey. 2003. Tools for analyzing multiple imputed datasets. {it:Stata Journal} 3(3): 226-244. {p 4 8 2} Li, K., T. Raghunathan, and D. Rubin. 1991. Large sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution. {it:Journal of the American Statistical Association} 86: 1065-1073. {p 4 8 2} Rubin, D. 1987. {it:Multiple Imputation for Nonresponse in Surveys}. New York: Wiley. {p 4 8 2} Schafer, J. 1997. {it:Analysis of Incomplete Multivariate Data}. London: Chapman & Hall. {p 4 8 2} van Buuren, S., H. C. Boshuizen and D. L. Knook. 1999. Multiple imputation of missing blood pressure covariates in survival analysis. {it:Statistics in Medicine} 18: 681-694. (Also see http://www.multiple-imputation.com.) {title:Also see} {psee} Online: {helpb ice} {p_end}