{smcl} {.-} help for {cmd:polychoric} and {cmd:polychoricpca} {right:author: {browse "http://www.komkon.org/~tacik/stata/":Stas Kolenikov}} {.-} {title:Polychoric and polyserial correlations} {p 8 27} {cmd:polychoric} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmd:pw} {cmdab:verb:ose} {cmd:nolog} {cmd:dots} ] {p 8 27} {cmd:polychoricpca} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmdab:sc:ore}{cmd:(}{it:prefix}{cmd:)} {cmdab:nsc:ore}{cmd:(}{it:#}{cmd:)} ] {title:Description} {p}{cmd:polychoric} estimates polychoric and polyserial correlations, and {cmd:polychoricpca} performs the principal component analysis on the resulting correlation matrix. The current version (1.4) of the routine requires Stata 8.2. {p}The polychoric correlation of two ordinal variables is derived as follows. Suppose each of the ordinal variables was obtained by categorizing a normally distributed underlying variable, and those two unobserved variables follow a bivariate normal distribution. Then the (maximum likelihood) estimate of that correlation is the polychoric correlation. If each of the ordinal variables has only two categories, then the correlation between the two variables is referred to as tetrachoric. {p}A closely related concept is that of a polyserial correlation. It is defined in a similar manner when one variable is continuous (assumed normal) and an ordinal variable. If there are only two categories of the latter, then the correlation is referred to as biserial. {p}If the number of the categories of one of the variables is greater than 10, {cmd:polychoric} treats it is continuous, so the correlation of two variables that have 10 categories each would be simply the usual Pearson moment correlation found through {help correlate}. {p}Make sure you read {bf:Remarks} about the known problems in the end of this help file! If you are coming from development/health economics research literature, you would also benefit from having a look at our paper on polychoric PCA. {title:Options of {cmd:polychoric}} {p 0 4}{cmd:dots} entertains the user by displaing dots for each estimated correlation. {p 0 4}{cmd:nolog} suppresses the log from the maximum likelihood estimation. {p 0 4}{cmd:pw} fills the entries of the correlation matrix with the pairwise correlation. If this option is not specified, then, similarly to {help correlate}, it uses the same subsample for all of the correlations. {p 0 4}{cmd:verbose} for each estimated correlation displays the names of the variables, the type of the estimated correlation (polychoric, polyserial, or Pearson moment correlation). {cmd:polychoric} will default to this option if there are only two input variables. If there are more than two variables, {cmd:polychoric} will not show anything, so you would need to address the returned values (see below). {title:Options of {cmd:polychoricpca}} {p 0 4}{cmd:score} is the prefix for the variables to be generated to contain the principal component scores. {p 0 4}{cmd:nscore} specifies the number of score variables to be generated. {cmd:polychoricpca} will show the output from the first three eigenvalues, at most. {title:Returned values} {cmd:polychoric} sets the following set of {help return} values. {p 0 4}{cmd:r(R)} (matrix) is the estimated correlation matrix{p_end} {p 0 4}{cmd:r(type)} (local) is the type of estimated correlation, one of {it:polychoric}, {it:polyserial}, or {it:Pearson}{p_end} {p 0 4}{cmd:r(rho)} is the estimated correlation{p_end} {p 0 4}{cmd:r(se_rho)} is the estimated standard error of the correlation{p_end} {p 0 4}{cmd:r(N)} is the number of observations used{p_end} {p 0 4}{cmd:r(LR0)} and {cmd:r(pLR0)} are the results of the likelihood ratio test of no correlation {p}In addition, if both variables are ordinal, the specification tests on normality are performed that compare the empirical proportions of the cells with the theoretical ones implied by normality, together with estimated polychoric correlation. The tests are not available for a 2x2 case as the tests have zero degrees of freedom. The returned results are: {p 0 4}{cmd:r(X2)}, {cmd:r(dfX2)} and {cmd:r(pX2)} are the observed test statistic, degrees of freedom, and the corresponding p-value of Pearson chi-square test: ;{p_end} {p 0 4}{cmd:r(G2)}, {cmd:r(dfG2)} and {cmd:r(pG2)} are the observed test statistic, degrees of freedom, and the corresponding p-value of the likelihood ratio test.{p_end} {p}If there are more than two input variables, then the returned values correspond to the last estimated pair, in the manner similar to {help correlate}. {p}{cmd:polychoricpca} returns the matrices of eigenvectors, eigenvalues, and the correlation matrix, as well as a few largest eigenvalues corresponding to the number of scores requested. {title:Example} {.-} {com}. use c:\stata8\auto {txt}(1978 Automobile Data) {com}. polychoric rep78 foreign {txt}Variables : {res}rep78 foreign {txt}Type : {res}polychoric {txt}Rho = {res}.80668059 {txt}S.e. = {res}.07631279 {txt}Goodness of fit tests: Pearson G2 = {res}.43127115{txt}, Prob( >chi2({res}3{txt})) = {res}.93370948 {txt}LR X2 = {res}.38908216{txt}, Prob( >chi2({res}3{txt})) = {res}.94248852 {txt} {com}. return list {txt}scalars: r(pLR0) = {res}5.12057153705e-08 {txt}r(LR0) = {res}29.67059428252011 {txt}r(pX2) = {res}.9424885157334509 {txt}r(dfX2) = {res}3 {txt}r(X2) = {res}.3890821586898692 {txt}r(pG2) = {res}.9337094786275901 {txt}r(dfG2) = {res}3 {txt}r(G2) = {res}.4312711544473018 {txt}r(se_rho) = {res}.0763127851819864 {txt}r(rho) = {res}.8066805935187174 {txt}r(N) = {res}69 {txt}r(sumw) = {res}69 {txt}macros: r(type) : "{res}polychoric{txt}" matrices: r(R) : {res} 2 x 2 {txt} {com}. polychoric foreign mpg {txt}Variables : {res}foreign mpg {txt}Type : {res}polyserial {txt}Rho = {res}.48603372 {txt}S.e. = {res}.11286311 {txt} {com}. polychoricpca foreign mpg rep78 {txt} k {c |} Eigenvalues {c |} Proportion explained {c |} Cum. explained {dup 4:{c -}}{c +}{dup 15:{c -}}{c +}{dup 24:{c -}}{c +}{dup 18:{c -}} {res} 1{txt} {c |} {res} 2.206757{col 21}{txt}{c |} {res} 0.735586{col 46}{txt}{c |} {res}0.735586 2{txt} {c |} {res} 0.615445{col 21}{txt}{c |} {res} 0.205148{col 46}{txt}{c |} {res}0.940734 3{txt} {c |} {res} 0.177798{col 21}{txt}{c |} {res} 0.059266{col 46}{txt}{c |} {res}1.000000 {txt} {com}. return list {txt}scalars: r(lambda3) = {res}.1777976956026297 {txt}r(lambda2) = {res}.6154453299437229 {txt}r(lambda1) = {res}2.206756974453646 {txt}matrices: r(R) : {res} 3 x 3 {txt}r(eigenvectors) : {res} 3 x 3 {txt}r(eigenvalues) : {res} 1 x 3 {txt} {com}. matrix list r(R) {txt}symmetric r(R)[3,3] foreign mpg rep78 foreign {res} 1 {txt} mpg {res}.55443556 1 {txt} rep78 {res}.80668065 .42655387 1 {txt} {.-} {title:Remarks} {p}{cmd:polychoric} is a bit sloppy with options. It assumes the user might want to specify some {help maximize:maximization options} for the {help ml} command, so anything it does not recognize as its own option is getting transferred to the {cmd:ml}. That may cause an error in the latter. {p}The standard error for the Pearson moment correlation does not account for weights properly. That will be fixed later if anybody needs that standard error. {title:Reference} {p 0 4}{bind:}Kolenikov, S., and Angeles, G. (2004). The Use of Discrete Data in Principal Component Analysis With Applications to Socio-Economic Indices. CPC/MEASURE Working paper No. WP-04-85. {browse "https://www.cpc.unc.edu/measure/publications/pdf/wp-04-85.pdf":Full text in PDF format} {p_end} {title:Also see} {p 0 21}{bind:}Online: help for {help correlate}, {help tetrac} (if installed) {p_end} {p 0 21}{bind:} Internet: {browse "http://www.google.com/search?q=polychoric%20correlation":Google search}{p_end} {title:Contact} Stas Kolenikov, skolenik@unc.edu