Desktop version

Home arrow Health

Principal Component Analysis

Principal component analysis (PCA) is widely used for constructing a subspace that approximates a distribution of training data and has a strong relationship with the SVD mentioned above. Let a set of training data be denoted by D = {*1, x2,..., xM} where xi .i = are D-vectors. Let x denote the mean of

xi .i = 1,2 , ..., M) and let an D x D covariance matrix be denoted by ^emp, where

It is clear that ?cov is a symmetric real matrix. In many cases, the intrinsic dimension of the distribution of the training data is lower than the dimension of the datum, D, and the PCA is used to obtain a low-dimensional subset that approximates the distribution.

In the PCA, a set of D pairs of an eigenvalues and an eigenvector is computed from the covariance matrix of the training data, ?cov. Let Xi 2 R (i = 1,2 , ..., D) denote the eigenvalues and let wi .i = 1,2 , ..., D) denote the eigenvectors. It is known that, given a symmetric real matrix, ^cov, there exist D pairs of .Xi, wi) that satisfy the following equation:

and that a set of the eigenvectors, {wi | i = 1,2 , ..., D}, is an orthonormal basis of a D dimensional space: || wi k = 1 and wf wj = 0 if i ф j. Assume that the eigenvalues are in decreasing order, X1 > X2 >•••> XD, and let a set of the K largest (K < D) eigenvalues be denoted by Wk = {w,-|1 < i < K}.

When approximating the training data using a subspace, П, every training datum, xi, is projected to the subspace, and the original data are approximated the projected data. Let xi denote the projected data, and let the approximation error, Е.П), be defined as follows:

Then, under a condition that the dimension of a subspace is fixed to K, the subspace that minimizes Е(П) is span(^K), and the unit vectors, wi (i = 1,2, ...,K) that span the subspace are called the principal components of the distribution of the training data. The projected data, xi, can be represented as

where zi = [z1, z2,..., zK ]r is the coordinate of xi described based on the basis, SK = fw1, w2,..., wK}, and can be computed as follows:

where WK is D x K matrix such that Substituting (2.63) to (2.62) results in

When the training data follow a Gaussian distribution, then each eigenvalue indicates the variance of the distribution along the corresponding eigenvector. Let a D-dimensional Gaussian distribution be denoted by N(x|x, X'cov), where the D- vector, x, denotes the mean and the D x D matrix, ^cov, denotes the covariance.

From (2.60), the following equation about the covariance matrix holds:



Multiplying both sides by WT = W~l from the right, one obtains



Substituting (2.71) to (2.66) results in

Here, let у = [y1 ,y2,...,yD]T = Wx. Because {wji = 1,2,...,D} is an orthonormal basis, W is a rotation matrix or a reflection matrix. Substituting у = Wx into Eq. (2.73) results in

where у = WX. Because Л-1 is a diagonal matrix, the result is

The D-dimensional Gaussian distribution is a product of D one-dimensional Gaussian distributions of y;, of which the mean is y; and the variance is A;. Here, it should be remembered that y = Wx and that y is the coordinates of x

represented using the basis, {w;i = 1,2.....Dg. The one-dimensional Gaussian

distribution, N(y;y;, A,/, represents the distribution along the eigenvector, w;, and the corresponding eigenvalue, A;, represents the variance along the direction. The principal components that span the subspace for the data approximation correspond to the eigenvectors along which the distribution has larger variances.

Comparing the SVD of a matrix shown in (2.56) with (2.70), it can be seen that U and V in (2.56) are identical with W in (2.70) and the singular values are identical with the eigenvalues, a, = A;.

< Prev   CONTENTS   Source   Next >

Related topics