You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

229 lines
8.4 KiB
Plaintext

{smcl}
{.-}
help for {cmd:polychoric} and {cmd:polychoricpca} {right:author: {browse "http://www.komkon.org/~tacik/stata/":Stas Kolenikov}}
{.-}
{title:Polychoric and polyserial correlations}
{p 8 27}
{cmd:polychoric}
{it:varlist}
[{it:weight}]
[{cmd:if} {it:exp}] [{cmd:in} {it:range}]
[{cmd:,}
{cmd:pw}
{cmdab:verb:ose}
{cmd:nolog}
{cmd:dots}
]
{p 8 27}
{cmd:polychoricpca}
{it:varlist}
[{it:weight}]
[{cmd:if} {it:exp}] [{cmd:in} {it:range}]
[{cmd:,}
{cmdab:sc:ore}{cmd:(}{it:prefix}{cmd:)}
{cmdab:nsc:ore}{cmd:(}{it:#}{cmd:)}
]
{title:Description}
{p}{cmd:polychoric} estimates polychoric and polyserial correlations,
and {cmd:polychoricpca} performs the principal component analysis on
the resulting correlation matrix. The current version (1.4) of the
routine requires Stata 8.2.
{p}The polychoric correlation of two ordinal variables is derived as follows.
Suppose each of the ordinal variables was obtained by categorizing a normally
distributed underlying variable, and those two unobserved variables follow
a bivariate normal distribution. Then the (maximum likelihood) estimate
of that correlation is the polychoric correlation. If each of the ordinal
variables has only two categories, then the correlation between the two
variables is referred to as tetrachoric.
{p}A closely related concept is that of a polyserial correlation. It is defined
in a similar manner when one variable is continuous (assumed normal) and
an ordinal variable. If there are only two categories of the latter, then
the correlation is referred to as biserial.
{p}If the number of the categories of one of the variables is greater than
10, {cmd:polychoric} treats it is continuous, so the correlation of two
variables that have 10 categories each would be simply the usual
Pearson moment correlation found through {help correlate}.
{p}Make sure you read {bf:Remarks} about the known problems
in the end of this help file! If you are coming from development/health
economics research literature, you would also benefit from having
a look at our paper on polychoric PCA.
{title:Options of {cmd:polychoric}}
{p 0 4}{cmd:dots} entertains the user by displaing dots for each
estimated correlation.
{p 0 4}{cmd:nolog} suppresses the log from the maximum likelihood estimation.
{p 0 4}{cmd:pw} fills the entries of the correlation matrix with the
pairwise correlation. If this option is not specified, then, similarly
to {help correlate}, it uses the same subsample for all of the
correlations.
{p 0 4}{cmd:verbose} for each estimated correlation displays the
names of the variables, the type of the estimated correlation
(polychoric, polyserial, or Pearson moment correlation).
{cmd:polychoric} will default to this option if there are only
two input variables. If there are more than two variables,
{cmd:polychoric} will not show anything, so you would need
to address the returned values (see below).
{title:Options of {cmd:polychoricpca}}
{p 0 4}{cmd:score} is the prefix for the variables to be generated
to contain the principal component scores.
{p 0 4}{cmd:nscore} specifies the number of score variables to be generated.
{cmd:polychoricpca} will show the output from the first three eigenvalues,
at most.
{title:Returned values}
{cmd:polychoric} sets the following set of {help return} values.
{p 0 4}{cmd:r(R)} (matrix) is the estimated correlation matrix{p_end}
{p 0 4}{cmd:r(type)} (local) is the type of estimated correlation, one of
{it:polychoric}, {it:polyserial}, or {it:Pearson}{p_end}
{p 0 4}{cmd:r(rho)} is the estimated correlation{p_end}
{p 0 4}{cmd:r(se_rho)} is the estimated standard error of the correlation{p_end}
{p 0 4}{cmd:r(N)} is the number of observations used{p_end}
{p 0 4}{cmd:r(LR0)} and {cmd:r(pLR0)} are the results of the likelihood ratio
test of no correlation
{p}In addition, if both variables are ordinal, the specification tests
on normality are performed that compare the empirical proportions of
the cells with the theoretical ones implied by normality, together
with estimated polychoric correlation. The tests are not available
for a 2x2 case as the tests have zero degrees of freedom.
The returned results are:
{p 0 4}{cmd:r(X2)}, {cmd:r(dfX2)} and {cmd:r(pX2)} are the observed
test statistic, degrees of freedom, and the corresponding p-value of Pearson chi-square test: ;{p_end}
{p 0 4}{cmd:r(G2)}, {cmd:r(dfG2)} and {cmd:r(pG2)} are the observed
test statistic, degrees of freedom, and the corresponding p-value of the
likelihood ratio test.{p_end}
{p}If there are more than two input variables, then the returned values
correspond to the last estimated pair, in the manner similar to
{help correlate}.
{p}{cmd:polychoricpca} returns the matrices of eigenvectors, eigenvalues,
and the correlation matrix, as well as a few largest eigenvalues corresponding
to the number of scores requested.
{title:Example}
{.-}
{com}. use c:\stata8\auto
{txt}(1978 Automobile Data)
{com}. polychoric rep78 foreign
{txt}Variables : {res}rep78 foreign
{txt}Type : {res}polychoric
{txt}Rho = {res}.80668059
{txt}S.e. = {res}.07631279
{txt}Goodness of fit tests:
Pearson G2 = {res}.43127115{txt}, Prob( >chi2({res}3{txt})) = {res}.93370948
{txt}LR X2 = {res}.38908216{txt}, Prob( >chi2({res}3{txt})) = {res}.94248852
{txt}
{com}. return list
{txt}scalars:
r(pLR0) = {res}5.12057153705e-08
{txt}r(LR0) = {res}29.67059428252011
{txt}r(pX2) = {res}.9424885157334509
{txt}r(dfX2) = {res}3
{txt}r(X2) = {res}.3890821586898692
{txt}r(pG2) = {res}.9337094786275901
{txt}r(dfG2) = {res}3
{txt}r(G2) = {res}.4312711544473018
{txt}r(se_rho) = {res}.0763127851819864
{txt}r(rho) = {res}.8066805935187174
{txt}r(N) = {res}69
{txt}r(sumw) = {res}69
{txt}macros:
r(type) : "{res}polychoric{txt}"
matrices:
r(R) : {res} 2 x 2
{txt}
{com}. polychoric foreign mpg
{txt}Variables : {res}foreign mpg
{txt}Type : {res}polyserial
{txt}Rho = {res}.48603372
{txt}S.e. = {res}.11286311
{txt}
{com}. polychoricpca foreign mpg rep78
{txt} k {c |} Eigenvalues {c |} Proportion explained {c |} Cum. explained
{dup 4:{c -}}{c +}{dup 15:{c -}}{c +}{dup 24:{c -}}{c +}{dup 18:{c -}}
{res} 1{txt} {c |} {res} 2.206757{col 21}{txt}{c |} {res} 0.735586{col 46}{txt}{c |} {res}0.735586
2{txt} {c |} {res} 0.615445{col 21}{txt}{c |} {res} 0.205148{col 46}{txt}{c |} {res}0.940734
3{txt} {c |} {res} 0.177798{col 21}{txt}{c |} {res} 0.059266{col 46}{txt}{c |} {res}1.000000
{txt}
{com}. return list
{txt}scalars:
r(lambda3) = {res}.1777976956026297
{txt}r(lambda2) = {res}.6154453299437229
{txt}r(lambda1) = {res}2.206756974453646
{txt}matrices:
r(R) : {res} 3 x 3
{txt}r(eigenvectors) : {res} 3 x 3
{txt}r(eigenvalues) : {res} 1 x 3
{txt}
{com}. matrix list r(R)
{txt}symmetric r(R)[3,3]
foreign mpg rep78
foreign {res} 1
{txt} mpg {res}.55443556 1
{txt} rep78 {res}.80668065 .42655387 1
{txt}
{.-}
{title:Remarks}
{p}{cmd:polychoric} is a bit sloppy with options. It assumes
the user might want to specify some {help maximize:maximization options}
for the {help ml} command, so anything it does not recognize as its
own option is getting transferred to the {cmd:ml}. That may cause
an error in the latter.
{p}The standard error for the Pearson moment correlation does not
account for weights properly. That will be fixed later if anybody
needs that standard error.
{title:Reference}
{p 0 4}{bind:}Kolenikov, S., and Angeles, G. (2004). The Use of Discrete Data
in Principal Component Analysis With Applications to Socio-Economic Indices.
CPC/MEASURE Working paper No. WP-04-85.
{browse "https://www.cpc.unc.edu/measure/publications/pdf/wp-04-85.pdf":Full text in PDF format}
{p_end}
{title:Also see}
{p 0 21}{bind:}Online: help for {help correlate}, {help tetrac} (if installed)
{p_end}
{p 0 21}{bind:} Internet: {browse "http://www.google.com/search?q=polychoric%20correlation":Google search}{p_end}
{title:Contact}
Stas Kolenikov, skolenik@unc.edu