Home Mathematics

# Multivariate Analysis

Suppose that one observes n subjects, indexed by i € {1,..., n}, and, for subject i, observes responses Xij, indexed bv j € {1,..., J}. Potentially, covariates are also observed for these subjects.

This chapter explores explaining the multivariate distribution of XtJ in terms of these covariates. Most simply, these covariates often indicate group membership.

## Standard Parametric Approaches

When data vectors may be treated as approximately multivariate Gaussian, the following standard techniques may be applied.

### Multivariate Estimation

Often one wants to learn about a the vector of population central values for each of the j responses on the various subjects. In this section, assume that the vectors are independent and identically distributed.

St andard parametric analyses presuppose distributions of data well-enough behaved that location can be well-estimated using a sample mean. Denote the mean by the vector X, where component j of this vector is given by Xj = Xij/n. Then X is the method of moments estimator for /i =

E[W(]. Assumptions guaranteeing that X has an asymptotically Gaussian distribution generally include the existence of some moment of the distribution greater than the second moment.

### One-Sample Testing

In this section, consider testing the null hypothesis that the vector ji of expectations takes on some value specified in advance; without loss of generality, take this value to be 0. Still assuming that the observations have a multivariate Gaussian distribution, then X is approximately multivariate Gaussian. First consider the case in which one knows the variance matrix E = Var [(Aji,..., Xjjr)], and assume that E is nonsingular. Then one can use as a test statistic

and its distribution under the null hypothesis is xj-

Dropping the multivariate Gaussian assumption, if E is unknown, and if one can estimate it as E using the usual sum of squares, then

has an F distribution, with J numerator degrees of freedom (Hotelling, 1931). If the distribution of (Хц, ..., Xij) has a density and a nonsingular variance matrix, then P £ singular j = 0. If E unknown, and is best estimated by a

nonsingular E, which is other than the sum of squares estimator, then generally (7.2) is approximately j. These techniques require that X is approximate multivariate Gaussian. This assumption is stronger than the assumption that each margin is univariate Gaussian; a simulated example is given in Figure 7.1.

### Two-Sample Testing

Suppose that observations may be divided into two groups of sizes Mi and М2, with the group for observation г indicated by g, € {1,2}. Test the null hypothesis that the multivariate distributions in the two groups are identical; note that this implies identical variance matrices. Let Xk be the vector of sample means for observations in group k, with components Xkj = J2iSi=k Xij/Mk- Let Ek,j,j' be the sample covariance for the group к values between responses j and f: = J2igi=k(XiJ ~ Xk)(Xij> ~ Xj)/(Mk ~ 1)- Let. tjtj, be the

pooled sample covariance for the all observations: Sjjr = ((Mi — l)Ei.jj/ + 2 — l)t2,j,j')/(Mi + M2 2). Then the Hotelling two-sample statistic

measures the difference between sample mean vectors, in a way that accounts for sample variance, and combines the response variables. Furthermore, under the null hypothesis of equality of distribution, and assuming that this distribution is multivariate Gaussian,

## Nonparametric Multivariate Estimation

In the absence of such parametric assumptions, one might instead measure location using the multivariate median.

FIGURE 7.1: Univariate Normal Data That are Not Bivariate Normal

Define the multivariate population median и to be the vector of univariate medians, as defined in §2.3.1. An estimator smed [Ab..., AT,,] of the population multivariate median may be constructed as the vector of whose components are the separate marginal sample medians; that is, smed [AL,..., Xn] = (smed [An, • • •, А,„],...,smed [X.n,..., A/,,]).

Alternatively, one might define smed [Aj,..., An] so to minimize the sum of distances from the median:

that is, the estimate minimizes the sum of distances from data vectors to the parameter vector, with distance measured by the sum of absolute val?ues of component-wise differences. Because one can interchange the order of summation in (7.4), the minimizer in (7.4) is the vector of component-wise minimizers. Furthermore, the minimizer for each component is the traditional univariate median as above.

A summary of multivariate median concepts is given by Small (1990).

### Equivariance Properties

In the univariate case (that is, J = 1), both the mean and the median are equivariant with respect to affine transformations of the raw data, as seen in §2.1.1 and §2.3.1. Equivariance to affine transformations in the multivariate case holds for the mean: for a vector a and a matrix В with J columns, and for Yj = a + BXj for all j, then Y = a + BX. A similar equality fails to hold for smed [Xi,..., Xn] and smed [Yj,.... Y,,], unless В is diagonal; hence the multivariate median is not equivariant under affine transformations.

## Nonparametric One-Sample Testing Approaches

Consider a null hypothesis stating that the marginal median vector и takes on a value specified in advance; without loss of generality, take this to be zero. In the multivariate Gaussian context, the statistic (7.2) represents the combination of separate location test statistics for the various components of the random vectors, and its distribution depends on multivariate normality of the underlying observations; an analogous statistic combining the various dimensions of X, that does not depend on parametric assumptions is constructed in this section.

A nonparametric hypothesis test can be constructed by assembling component-wise nonparametric statistics into a vector T, analogous to X. and centered so that Eo [T] = 0. One might combine sign test statistics, or signed-rank statistics if one assumes symmetry, often in the context of paired data. That is,

for S(-u) = < ^ Or, define Rij(X) to be the marginal rank of

V ' [-1 for ы < О П '

Xjj among {|Xy|,..., Xnj}, and set

A multivariate test statistic is constructed as a vector of univariate statistics,

Then combine components of T from (7.5) to give the multivariate sign test statistic, or from (7.6) to give the multivariate sign rank test. In either case, components are combined using

for T = Varo [Т]. As in §2.3, in the case that the null location value is 0, the null distribution for the multivariate test statistic is generated by assigning equal probabilities to all 2” modifications of the data set by multiplying the rows (Xa, • • •, Xjj) by +1 or —1. That is, the null hypothesis distribution of T(X) is generated by placing probability 2~n on all of the 2" elements of

Test statistics (7.1) and (7.2) arose as quadratic forms of independent and identically distributed random vectors, and the variances included in their definitions were scaled accordingly. Statistic (7.3) is built using a more complicated variance; this pattern will repeat with nonparametric analogies to parametric tests.

Combining univariate tests into a quadratic form raises two difficulties. In previous applications of rank statistics, that is, in the case of univariate sign and signed-rank one-sample tests, in the case of two-sample Mann-Whitney- Wilcoxon tests, and in the case of of Kruskal-Wallis testing, all dependence of the permutation distribution on the original data was removed through ranking. This is not the case for T, since this distribution involves correlations between ranks of the various response vectors. These correlations are not specified by the null hypothesis. The separate tests are generally dependent, and dependence structure depends on distribution of raw observations. The asymptotic distribution of (7.7) relies on this dependence via the correlations between components of T. The correlations must be estimated.

Furthermore, the distribution of W of (7.7) under the null hypothesis is dependent on the coordinate system for the variables, but, intuitively, this dependence on the coordinate system might be undesirable. For example, suppose that (А, X-2,) has an approximate multivariate Gaussian distribution, with expectation /i. and variance X. with X known. Consider the null hypothesis H0 : p = 0. Then the canonical test is (7.1), and it is unchanged if the test is based on (Uj. Vi) for 17,; = Хц + X-2i and V) = ХцX-2t, with X modified accordingly. Hence the parametric analysis is independent of the coordinate system.

The first of this difficulty is readily addressed. Under Ho, the marginal sign test statistic (7.5) satisfies Tj/y/n к {?>((), 1). Conditional on the relative ranks of the absolute values of the observations, the permutation distribution is entirely specified, and conditional joint moments are calculated. Under the permutation distribution,

and so the variance estimate used in (7.7) has components Vjj> = n. Here again, the covariance is defined under the distribution that consists of 2” random reassignment of signs to the data vectors, each equally weighted. As before, variances do not depend on data values, but covariances do depend on data values. The solution for the Wilcoxon signed-rank test is also available (Bennett, 1965).

Using the data to redefine the coordinate system may be used to address the second problem (Randles, 1989; Oja and Randles, 2004).

Combine the components of T(X) to construct the statistic

using an estimate E of X = Var [T] as in (7.9), or similarly for the signed-rank statistic.

The multivariate central limit theorem of Hajek (1960), and the quality of approximation to T, justifies approximating the null distribution of W by Xj distribution. The test rejects the null hypothesis of zero component-wise medians when

for Gj1(l — q, 0) the 1 — a quantile of the j distribution, with non-centrality parameter 0. Bickel (1965) discusses these (and other tests) in generality.

Example 7.3.1 Consider the data of Example 6.4-2. We test the null hypothesis that the joint distribution of systolic and diastolic blood pressure changes is symmetric about (0,0), using Hotelling’s T2 and the two asymptotic tests that substitutes signs and signed-ranks for data. This test is performed in R using

# For Hotelling and multivariate rank tests resp: library(Hotelling); library(ICSNP) cat(’ One-sample Hotelling Test ’)

HotellingsT2(bp[,c("spd","dpd") ]) cat(’ Multivariate Sign Test ’) rank.ctest(bp[,c("spd","dpd")],scores="sign") cat(’ Multivariate Signed Rank Test ’) rank.ctest(bp[,c("spd","dpd")])

P-values for Hotelling’s T2, the marginal sign rank test, and marginal sign test, are 9.839 x 10-6, 2.973 x 10-3, and 5.531 x 10-4.

Tables 7.1 and 7.2 contain attained levels and powers for one-sample multivariate tests with two manifest variables of nominal level 0.05, from various distributions.

TABLE 7.1: Level of multivariate tests

 Test Sample Size 20 Sample Size 40 Normal Cauchy Laplace Normal Cauchy Laplace T2 0.04845 0.01550 0.04335 0.05025 0.01665 0.04605 Sign test 0.04215 0.04130 0.04365 0.04865 0.05055 0.04650 Sign rank test 0.04115 0.03895 0.04170 0.05065 0.04980 0.04745

TABLE 7.2: Power of multivariate tests

 Test Sample Size 20 Sample Size 40 Normal Cauchy Laplace Normal Cauchy Laplace T2 0.74375 0.08090 0.48460 0.97760 0.08845 0.79215 Sign test 0.70330 0.26290 0.55560 0.96955 0.52135 0.88700 Sign rank test 0.52615 0.32605 0.55025 0.87520 0.64720 0.89445

Tests compared are Hotelling’s T2 tests, and test (7.10) applied to the sign and signed-rank tests. Tests have close to their nominal levels, except for Hotelling’s test with the Cauchy distribution; furthermore, the agreement is closer for sample size 40 than for sample size 20. Furthermore, the sign test power is close to that of Hotelling’s test for Gaussian variables, and the signed-rank test has attenuated power. Both nonparametric tests have good power for the Cauchy distribution, although Hotelling’s test performs poorly, and both perform better than Hotelling’s test for Laplace variables.

Some rare data sets simulated to create Tables 7.1 and 7.2 include some for which T is estimated as singular. Care must be taken to avoid difficulties; in such cases, p-values are set to 1.

### More General Permutation Solutions

One might address this problem using permutation testing. First, select an existing parametric test statistic U(X), perhaps a Hotelling statistic, or a rank-based statistic. Under the permutation null distribution, the sampling distribution puts equal weight 2-n to all 2n values of the statistic evaluated at each element of (7.8); these 2" values need not all be unique. For n large enough to make exhaustive evaluation prohibitive, a random subset of elements of (7.8) may be selected. The p-value is reported as the proportion of data sets with permuted signs having the test statistic value as large as, or larger than, that observed. In this way, the analysis of the previous subsection for the sign test, and by extension the signed-rank test, can be extended to general rank tests, including tests with data as scores.

 Related topics