Home Management



Appendix 3.BTable of Contents:
Brief overview of correspondence analysisThe mathematics of correspondence analysis can be a little challenging because it mostly relies on matrix algebra. The main matrix algebra tool is the Singular Value Decomposition (SKD) outlined in the Appendix to Chapter 2. Recall from that discussion that the SVD method decomposes a matrix A of size nxp into three parts which, when multiplied together return the original matrix A: where
values are the singular values; and • V^{T} is an / X p orthogonal matrix such that V^{T} V = I where I is p Xp. The following is based on Greenacre [2007] which is the definitive presentation and development of correspondence analysis. A Ixj matrix S of standardized residuals is the basis for correspondence analysis. The Singular Value Decomposition (SVD) is applied to S to give
The SVD is shown in the .S' VD Components portion of Figure 3.22. The matrix Л is a Ixj diagonal matrix with elements in descending order. The diagonal elements are singular values. The left matrix, U, is lxl and provides information about the rows of the original crosstab. The right matrix, V^{T}, is Ixj and provides information about the columns of the crosstab. The main output of a correspondence analysis is a map which means that plotting coordinates are needed. Since the crosstab is rows by columns, a set of plotting coordinates is needed for the rows and another set is needed for the columns. These are given as functions of the SVD components. The row coordinates are designated as Ф and are based on
while the column coordinates, designated as Г, are based on
These sets are sometimes called the Standard Row Coordinates and the Standard Column Coordinates, respectively. For plotting purposes, however, these are usually adjusted as
and
These are the Principal Row Coordinates and Principal Column Coordinates, respective. These are shown in the Coordinates section of Figure 3.22. The full correspondence analysis for the example crosstab is shown in Figure 3.21. Notice that the plotting coordinates agree with the principal coordinates in Figure 3.21. The SVD provides more information than just the plotting coordinates. It also provides measures of the amount of variation in the table explained by the dimensions. These are the inertias. The inertia is the variation in the crosstab. See Greenacre and Korneliussen [2017]. It can be shown that the singular values from the SKD of the crosstab are related to the Pearson Chisquare value. The singular values are usually arranged in descending order with the corresponding eigenvectors appropriately arranged. The square of a singular value is called the inertia of the table where the concept of inertia comes from the physics of a rigid body. In particular, it is the force or torque necessary to change the angular momentum of the rigid body. The formula for moment of inertia and variance are the same, hence in the correspondence literature the variance of the table is referred to as the inertia.^{16} There is a singular value for each dimension extracted from the crosstab table where the total number of dimensions that could be extracted is d = min(r— 1, с— 1) for r rows and c columns of the table. The singular value for each dimension is SVj. If Aj is the inertia for the i^{11}' dimension, then A = jjfj A_{(}. It can be shown that A, = S V^{2} so A = SV^{2}. It can also be shown that the total chisquare of the table is /~ = NXЛ where N is the total sample size. This means that x~ ^{=} NX A,. From Figure 3.13, you have the data in Table 3.9. Appendix 3.CVery brief overview of ordinary least squares analysisAssume there is one dependent variable arranged as а и X 1 vector Y. Also assume there are p > 1 independent variables arranged ina«X(p+l) matrix X where the first column consists of Is for the constant term. Then a model is Y = Xp + e (3.C.1) where /? is a (/; + 1) X 1 vectors of parameters to be estimated with the first element as the constant, and e is а и X 1 vector of random disturbance terms. It is usually TABLE 3.9 This table illustrates the calculations for the inertia values for the correspondence analysis.
N = 4300 from Figure 3.5 X^{2 =} 92.613 from Figure 3.14. FIGURE 3.21 this is a comprehensive correspondence analysis report for the example prototype table Figure 3.20. assumed that e, ~ jV(0, FIGURE 3.22 These are the details for the Singular Value Decomposition calculations for the correspondence analysis in Figure 3.21. It is easy to show that
Then E(p) = P and V(p) = a^{2} x (X^{T}X)‘. See Greene [2003] for a detailed development of this result. Also see Goldberger [1964] for a classic derivation. If there is perfect multicollinearity, then the X^{T}X matrix cannot be inverted and the parameters cannot be estimated. Brief overview of principal components analysisPrincipal components analysis works by finding a transformation of the X matrix into a new matrix such that the column vectors of the new matrix are uncorrelated. An important first step in principal components analysis is to meancenter the data. This involves finding the mean for each variable and then subtracting these means from their respective variable. This has the effect of removing any large values that could negatively impact results. To meancenter, let X be the и X /> matrix of variables. Let 1„ be a column vector with a 1 for each element so that 1„ is nX 1. Then a I X p row vector of means is given by
where (ljl_{;l})^{_1} = ^{]}/». The meancentered matrix is then
I can now do an SVD on X to get X = UXP^{T} where U and P are orthogonal matrices. Since P is orthogonal, then PP^{T} = I implying that P^{T} = P ^{1}. Similarly for U. Let T = US so that X = TP^{T}. Then XP = TP^{T}P or T = XP. The matrix T is the matrix of principal component scores and P is the matrix of principal components that transform X. See Ng [20131. Following Ng [2013], you can now write the covariance matrix for the principal component scores, аЧ'(Т.Т), as
where S = '/(«iiX^{T}X. The matrix S is a covariance matrix and is square so a spectral decomposition can be applied to get S = UDU^{T}. Then,
Let P = U, then
Since D is diagonal, the diagonal elements are the variances and they are in decreasing order. Also, since D is diagonal, the offdiagonal elements are all zero implying independence. Finally, as noted by Lay [2012], the matrix of principal components, P, makes the covariance matrix for the scores diagonal. Without loss of generality, the columns of T are arranged in descending order of the variance explained so that the first column explains the most variance in X, the second column explains the second most variation, and so forth. That is,
where P is an pXp transformation matrix called the principal components and T is the resulting n Xp matrix of principal components scores resulting from the transformation. Usually, only the first к < p columns of T are needed since they account for most of the variation in X. This reduced matrix can be denoted as T_{fc}. In principal components regression, the reduced matrix T*. replaces the matrix X in the OLS formulation. See Jolliffe 2002] for the definitive treatment of principal components analysis. Principal components regression analysisPrincipal components regression analysis involves using the principal components scores as the independent variables in a regression model. The columns of this score matrix are orthogonal by construction so multicollinearity is not an issue. If Y is a column vector for the dependent variable, then the model is Y = T/?, ignoring the disturbance term vector for simplicity, and OLS can be used. Brief overview of partial least squares analysisPartial least squares (PLS), initially developed by Wold [1966b, works by finding linear combinations of independent variables, called manifest variables, which are directly observable. The linear combinations are latent or hidden in the data and are sometimes called factors, components (as in PCA), latent vectors, or latent variables. The factors should be independent of each other and account for most of the variance of Y. This is akin to principal components analysis. PLS uses the result that X can be decomposed into X = TP^{T} as shown above. The vector T is the score vector for X. In particular, a single linear combination or factor can be extracted from the X matrix, say t, which is one of many such possible factors. This factor represents a reduced combination of the variables in X which means it can be used in regression models for predicting X and Y. Let the predictions be X_{(l} and Y_{0}. The subscript “0” on both predictions indicates that this is the base or initial prediction. The two predictions are based on OLS estimations using the OLS estimation formula from above. In this case, the extracted factor, t, is the independent variable and X_{0} and Y_{0} are the dependent variables. The prediction for X_{0} is given by X_{0} = t^trTX,,. Similarly, Y_{0} = t(t^{T}t)'t^{T}Y_{{1}. The factor t as a linear combination of the manifest independent variables is important. This is, however, only one factor combination out of many possible combinations. The combination used should meet a criterion and this is that the factor for X should have the maximum covariance with a factor extracted for Y. The extracted factor for Y is u = Y,,q. The covariance is cov(t,u) = t^{T}u. So the objective is to extract factors (or latent linear combinations of manifest independent and dependent variables) such that the covariance between them is as large as possible. Once the first pair of factors are extracted, you have to find another pair that meets the same criterion. You cannot, however, have the first set be used again so they have to be deleted. This is done by subtraction, thus creating two new matrices. That is, you now have X_{t} = X_{0}—X_{0} and Y, = Y_{0}Y_{0}. This is sometimes referred to as “partialing out” the effect of a factor. The process outlined above is repeated using these two new matrices. The overall process of doing OLS regressions and partialing out the predicted values is continued until either you reach a desired number of extracted factors or no more factors can be extracted. The combination of OLS regressions and partialing out predicted values is the basis of the name partial least squares.^{17} Since predicted values are partialed out of both the X and Y matrices, an iterative algorithm can be specified. This is usually written as four successive steps from i = 0,1,..., и where it is the maximum number of iterations of the algorithm:
Stop the iterations either when the number of desired iterations (i.e., factors) is reached or no more factors can be extracted as determined by a convergence criterion.^{18} The SAS Proc PLS implementation uses a default of и = 200 iterations and a default convergence criterion of 1СГ^{12}. The algorithm outlined here is called the NIPALS Algorithm which stands for “Nonlinear Iterative Partial Least Squares.” It was developed by Wold [1966aJ. An alternative algorithm is SIMPLS. See de Jong [1993J for a discussion. There are software packages that implement this PLS algorithm. SAS has Proc PLS and JMP has a partial least squares platform. The book by Cox and Gaudard [2013J gives an excellent overview of PLS using JMP. An interesting history of PLS is provided by Gaston Sanchez: “The Saga of PLS” at sagaofpls.github.io. Notes

<<  CONTENTS  >> 

Related topics 