You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
229 lines
8.4 KiB
Plaintext
229 lines
8.4 KiB
Plaintext
{smcl}
|
|
{.-}
|
|
help for {cmd:polychoric} and {cmd:polychoricpca} {right:author: {browse "http://www.komkon.org/~tacik/stata/":Stas Kolenikov}}
|
|
{.-}
|
|
|
|
{title:Polychoric and polyserial correlations}
|
|
|
|
{p 8 27}
|
|
{cmd:polychoric}
|
|
{it:varlist}
|
|
[{it:weight}]
|
|
[{cmd:if} {it:exp}] [{cmd:in} {it:range}]
|
|
[{cmd:,}
|
|
{cmd:pw}
|
|
{cmdab:verb:ose}
|
|
{cmd:nolog}
|
|
{cmd:dots}
|
|
]
|
|
|
|
{p 8 27}
|
|
{cmd:polychoricpca}
|
|
{it:varlist}
|
|
[{it:weight}]
|
|
[{cmd:if} {it:exp}] [{cmd:in} {it:range}]
|
|
[{cmd:,}
|
|
{cmdab:sc:ore}{cmd:(}{it:prefix}{cmd:)}
|
|
{cmdab:nsc:ore}{cmd:(}{it:#}{cmd:)}
|
|
]
|
|
|
|
{title:Description}
|
|
|
|
{p}{cmd:polychoric} estimates polychoric and polyserial correlations,
|
|
and {cmd:polychoricpca} performs the principal component analysis on
|
|
the resulting correlation matrix. The current version (1.4) of the
|
|
routine requires Stata 8.2.
|
|
|
|
{p}The polychoric correlation of two ordinal variables is derived as follows.
|
|
Suppose each of the ordinal variables was obtained by categorizing a normally
|
|
distributed underlying variable, and those two unobserved variables follow
|
|
a bivariate normal distribution. Then the (maximum likelihood) estimate
|
|
of that correlation is the polychoric correlation. If each of the ordinal
|
|
variables has only two categories, then the correlation between the two
|
|
variables is referred to as tetrachoric.
|
|
|
|
{p}A closely related concept is that of a polyserial correlation. It is defined
|
|
in a similar manner when one variable is continuous (assumed normal) and
|
|
an ordinal variable. If there are only two categories of the latter, then
|
|
the correlation is referred to as biserial.
|
|
|
|
{p}If the number of the categories of one of the variables is greater than
|
|
10, {cmd:polychoric} treats it is continuous, so the correlation of two
|
|
variables that have 10 categories each would be simply the usual
|
|
Pearson moment correlation found through {help correlate}.
|
|
|
|
{p}Make sure you read {bf:Remarks} about the known problems
|
|
in the end of this help file! If you are coming from development/health
|
|
economics research literature, you would also benefit from having
|
|
a look at our paper on polychoric PCA.
|
|
|
|
{title:Options of {cmd:polychoric}}
|
|
|
|
{p 0 4}{cmd:dots} entertains the user by displaing dots for each
|
|
estimated correlation.
|
|
|
|
{p 0 4}{cmd:nolog} suppresses the log from the maximum likelihood estimation.
|
|
|
|
{p 0 4}{cmd:pw} fills the entries of the correlation matrix with the
|
|
pairwise correlation. If this option is not specified, then, similarly
|
|
to {help correlate}, it uses the same subsample for all of the
|
|
correlations.
|
|
|
|
{p 0 4}{cmd:verbose} for each estimated correlation displays the
|
|
names of the variables, the type of the estimated correlation
|
|
(polychoric, polyserial, or Pearson moment correlation).
|
|
{cmd:polychoric} will default to this option if there are only
|
|
two input variables. If there are more than two variables,
|
|
{cmd:polychoric} will not show anything, so you would need
|
|
to address the returned values (see below).
|
|
|
|
{title:Options of {cmd:polychoricpca}}
|
|
|
|
{p 0 4}{cmd:score} is the prefix for the variables to be generated
|
|
to contain the principal component scores.
|
|
|
|
{p 0 4}{cmd:nscore} specifies the number of score variables to be generated.
|
|
{cmd:polychoricpca} will show the output from the first three eigenvalues,
|
|
at most.
|
|
|
|
{title:Returned values}
|
|
|
|
{cmd:polychoric} sets the following set of {help return} values.
|
|
|
|
{p 0 4}{cmd:r(R)} (matrix) is the estimated correlation matrix{p_end}
|
|
{p 0 4}{cmd:r(type)} (local) is the type of estimated correlation, one of
|
|
{it:polychoric}, {it:polyserial}, or {it:Pearson}{p_end}
|
|
{p 0 4}{cmd:r(rho)} is the estimated correlation{p_end}
|
|
{p 0 4}{cmd:r(se_rho)} is the estimated standard error of the correlation{p_end}
|
|
{p 0 4}{cmd:r(N)} is the number of observations used{p_end}
|
|
{p 0 4}{cmd:r(LR0)} and {cmd:r(pLR0)} are the results of the likelihood ratio
|
|
test of no correlation
|
|
|
|
{p}In addition, if both variables are ordinal, the specification tests
|
|
on normality are performed that compare the empirical proportions of
|
|
the cells with the theoretical ones implied by normality, together
|
|
with estimated polychoric correlation. The tests are not available
|
|
for a 2x2 case as the tests have zero degrees of freedom.
|
|
The returned results are:
|
|
|
|
{p 0 4}{cmd:r(X2)}, {cmd:r(dfX2)} and {cmd:r(pX2)} are the observed
|
|
test statistic, degrees of freedom, and the corresponding p-value of Pearson chi-square test: ;{p_end}
|
|
{p 0 4}{cmd:r(G2)}, {cmd:r(dfG2)} and {cmd:r(pG2)} are the observed
|
|
test statistic, degrees of freedom, and the corresponding p-value of the
|
|
likelihood ratio test.{p_end}
|
|
|
|
{p}If there are more than two input variables, then the returned values
|
|
correspond to the last estimated pair, in the manner similar to
|
|
{help correlate}.
|
|
|
|
|
|
{p}{cmd:polychoricpca} returns the matrices of eigenvectors, eigenvalues,
|
|
and the correlation matrix, as well as a few largest eigenvalues corresponding
|
|
to the number of scores requested.
|
|
|
|
{title:Example}
|
|
|
|
{.-}
|
|
{com}. use c:\stata8\auto
|
|
{txt}(1978 Automobile Data)
|
|
|
|
{com}. polychoric rep78 foreign
|
|
|
|
{txt}Variables : {res}rep78 foreign
|
|
{txt}Type : {res}polychoric
|
|
{txt}Rho = {res}.80668059
|
|
{txt}S.e. = {res}.07631279
|
|
{txt}Goodness of fit tests:
|
|
Pearson G2 = {res}.43127115{txt}, Prob( >chi2({res}3{txt})) = {res}.93370948
|
|
{txt}LR X2 = {res}.38908216{txt}, Prob( >chi2({res}3{txt})) = {res}.94248852
|
|
{txt}
|
|
{com}. return list
|
|
|
|
{txt}scalars:
|
|
r(pLR0) = {res}5.12057153705e-08
|
|
{txt}r(LR0) = {res}29.67059428252011
|
|
{txt}r(pX2) = {res}.9424885157334509
|
|
{txt}r(dfX2) = {res}3
|
|
{txt}r(X2) = {res}.3890821586898692
|
|
{txt}r(pG2) = {res}.9337094786275901
|
|
{txt}r(dfG2) = {res}3
|
|
{txt}r(G2) = {res}.4312711544473018
|
|
{txt}r(se_rho) = {res}.0763127851819864
|
|
{txt}r(rho) = {res}.8066805935187174
|
|
{txt}r(N) = {res}69
|
|
{txt}r(sumw) = {res}69
|
|
|
|
{txt}macros:
|
|
r(type) : "{res}polychoric{txt}"
|
|
|
|
matrices:
|
|
r(R) : {res} 2 x 2
|
|
{txt}
|
|
{com}. polychoric foreign mpg
|
|
|
|
{txt}Variables : {res}foreign mpg
|
|
{txt}Type : {res}polyserial
|
|
{txt}Rho = {res}.48603372
|
|
{txt}S.e. = {res}.11286311
|
|
{txt}
|
|
{com}. polychoricpca foreign mpg rep78
|
|
|
|
{txt} k {c |} Eigenvalues {c |} Proportion explained {c |} Cum. explained
|
|
{dup 4:{c -}}{c +}{dup 15:{c -}}{c +}{dup 24:{c -}}{c +}{dup 18:{c -}}
|
|
{res} 1{txt} {c |} {res} 2.206757{col 21}{txt}{c |} {res} 0.735586{col 46}{txt}{c |} {res}0.735586
|
|
2{txt} {c |} {res} 0.615445{col 21}{txt}{c |} {res} 0.205148{col 46}{txt}{c |} {res}0.940734
|
|
3{txt} {c |} {res} 0.177798{col 21}{txt}{c |} {res} 0.059266{col 46}{txt}{c |} {res}1.000000
|
|
{txt}
|
|
{com}. return list
|
|
|
|
{txt}scalars:
|
|
r(lambda3) = {res}.1777976956026297
|
|
{txt}r(lambda2) = {res}.6154453299437229
|
|
{txt}r(lambda1) = {res}2.206756974453646
|
|
|
|
{txt}matrices:
|
|
r(R) : {res} 3 x 3
|
|
{txt}r(eigenvectors) : {res} 3 x 3
|
|
{txt}r(eigenvalues) : {res} 1 x 3
|
|
{txt}
|
|
{com}. matrix list r(R)
|
|
|
|
{txt}symmetric r(R)[3,3]
|
|
foreign mpg rep78
|
|
foreign {res} 1
|
|
{txt} mpg {res}.55443556 1
|
|
{txt} rep78 {res}.80668065 .42655387 1
|
|
{txt}
|
|
{.-}
|
|
|
|
{title:Remarks}
|
|
|
|
{p}{cmd:polychoric} is a bit sloppy with options. It assumes
|
|
the user might want to specify some {help maximize:maximization options}
|
|
for the {help ml} command, so anything it does not recognize as its
|
|
own option is getting transferred to the {cmd:ml}. That may cause
|
|
an error in the latter.
|
|
|
|
{p}The standard error for the Pearson moment correlation does not
|
|
account for weights properly. That will be fixed later if anybody
|
|
needs that standard error.
|
|
|
|
{title:Reference}
|
|
|
|
{p 0 4}{bind:}Kolenikov, S., and Angeles, G. (2004). The Use of Discrete Data
|
|
in Principal Component Analysis With Applications to Socio-Economic Indices.
|
|
CPC/MEASURE Working paper No. WP-04-85.
|
|
{browse "https://www.cpc.unc.edu/measure/publications/pdf/wp-04-85.pdf":Full text in PDF format}
|
|
{p_end}
|
|
|
|
|
|
{title:Also see}
|
|
|
|
{p 0 21}{bind:}Online: help for {help correlate}, {help tetrac} (if installed)
|
|
{p_end}
|
|
{p 0 21}{bind:} Internet: {browse "http://www.google.com/search?q=polychoric%20correlation":Google search}{p_end}
|
|
|
|
{title:Contact}
|
|
|
|
Stas Kolenikov, skolenik@unc.edu
|