Home Mathematics



Con dence Regions for a Vector Shift ParameterTable of Contents:
Proceed analogously to the onedimensional confidence interval construction as in §2.3.3. Introduce a shift parameter to move the data to an arbitrary point null hypothesis. In this onesample case, one may apply the one sample test for the null hypothesis that marginal medians are 0 to the shifted data X — l_{n}
satisfies P [д € T = 1 — a using the test inversion argument of (1.16). This is the onesample case of the region proposed by Kolassa and Seifu (2013). Example 7.4.1 Recall again the blood pressure data set of Example 6.4.2. Figure 7.2 is generated by library(MultNonParam); shifter(bp[,c("dpd","spd")]) and exhibits the 0.05 contour of pvalues for the multivariate test constructed from sign rank tests for each of systolic and diastolic blood pressure, and forms a 95% confidence region. Note the lack of convexity. TwoSample MethodsTwosample methods are generally of more interest than the preceding one sample methods. Consider a multivariate dataset Xij for г € {1,..., M1+M2} and j € {1,..., J}, with data from the first sample occupying the first M rows of this matrix, and data from the second sample occupying the last М2 rows. Assume that the vectors (Afi,.... Хц) and (Хщ,..., X^j), for all i, are independent if i Ф i'. Assume further than the vectors (Хц,..., X,j) all FIGURE 7.2: Median Blood Pressure Change Confidence Region have the same distribution for г < Mi, and that the vectors (Хц,..., Хц) all have the same distribution for г > M. Let g be a vector indicating group membership; i < Mi, and gi = 2 if i > Mi. As in §7.1, consider testing and confidence interval questions. Hypothesis TestingCombine the techniques for onedimensional twosample testing of Chapter 3 and for multidimensional onesample testing of §7.1. Consider the null hypothesis Ho that the distribution of (Хц,.... Xu) is the same for all i. Permutation TestingUnder the null hypothesis of equality of distributions across the two groups, all assignments of the observed vectors among the two groups that keep the sizes of groups 1 and 2 at M and М2 respectively, are equally likely. Hence a permutation test may be constructed by evaluating the Hotelling statistic, or any other parametric statistic, at each of the such reassignments of the observations to the groups,
pvalues are calculated by counting the number of such statistics with values equal to or exceeding the observed value, and dividing the count by (^д^^{2}) • Other marginal statistics may be combined; for example, one might use the Max /statistic, defined by first calculating univariate /statistics for each manifest variable, and reporting the maximum. This statistic is inherently onesided, in that it presumes an alternative in which each marginal distribution for the second group is systematically larger than that of the first group. Alternatively, one might take the absolute value of the /statistics before maximizing. One might do a parallel analysis with either the maximum of Wilcoxon ranksum statistics or the maximum of their absolute values, after shifting to have a null expectation zero. Finally, one might apply permutation testing to the statistic (7.3), calculated on ranks instead of data values, to make the statistic less sensitive to extreme values. Example 7.5.1 Consider the data on wheat yields, in metric tons per hectare (Cox and Snell, 1981, Set 5), reflecting yields of six varieties of wheat grown at ten different experimental stations, from http://stat.rutgers.edu/home/kolassa/Data/set5.data . Two of these varieties, Huntsman and Atou, are present at all ten stations, and so the analysis will include only these. Stations are included from three geographical regions of England; compare those in the north to those elsewhere. The standard Hotelling twosample test may be performed in R using wheat<as.data.frame(scan("set5.data",what=list(variety="", y0=0,у1=0,y2=0,y3=0,y4=0,у5=0,у6=0,у7=0,у8=0,у9=0), па.strings="")) # Observations are represented by columns rather than by # rows. Swap this. New column names are in first column, dimnames(wheat)[[1]]<wheat[,1] wheat<as.data.frame(t(wheat[,1])) dimnames(wheat)[[1]]<c("El","E2","N3","N4","N5","N6","W7", "E8","E9","N10") wheat$region<factor(c("North","Other")[l+( substring(dimnames(wheat)[[1]],1,1)!="N")] , c("Other","North")) attach(wheat) plot(Huntsman,Atou,pch=(region=="North")+1, main="Wheat Yields") legend(6,5,legend=c("Other","North"),pch=l:2) Data are plotted in Figure 7.3. The normaltheory pvalue for testing equality of the bivariate yield distributions in the two regions is given by library(Hotelling)#for hotelling.test print(hotelling.test(Huntsman+Atou'region)) The results of hotelling.test must be explicitly printed, because the function codes the results as invisible, and so results won’t be printed otherwise. The pvalue is 0.0327. Comparing this to the univariate results t.test(Huntsman'region);t.test(Atou'region) gives two substantially smaller pvalues; in this case, treatment as a multivariate distribution did not improve statistical power. On the other hand, the normal quantile plot for Atou yields shows some lack of normality. Outliers do not appear to be present in these data, but if they were, they could be addressed by performing the analysis on ranks, either using asymptotic normality: cat(’Wheat rank test, normal theory pvalues’) print(hotelling.test(rank(Huntsman)+rank(Atou)"region)) or using permutation testing to avoid the assumption of multivariate normality: #Bruteforce way to get estimate of permutation pvalue for #both T2 and the max t statistic. cat(’Permutation Tests for Wheat Data, Brute Force’) obsh<hotelling.test(Huntsman+Atou'region)$stats$statistic obst<max(c(t.test(Huntsman'region)$statistic, t.test(Atou'region)Sstatistic)) out<array(NA,c(1000,2)) dimnames(out)<list(NULL,c("Hotelling","t test")) for(j in seq(dim(out)[[1]])){ newr<sample(region,length(region)) hto<hotelling.test(Huntsman+Atou'newr) out[j,1]<hto$stats$statistic>=obsh out[j,2]<max(t.test(Huntsman~newr)$statistic, t.test(Atou~newr)$statistic)>=obst > apply(out,2,mean) giving permutation pvalues for the Hotelling and maxt statistics of 0.023 and 0.003 respectively. The smaller maxt statistic reflects the strong association between variety yields across stations. If one wants only the Hotelling statistic significance via permutation, one could use print(hotelling.test(Huntsman+Atou'region,perm=T, progBar=FALSE)) The argument progBar will print a progress bar, if desired, and an additional argument controls the number of random permutations. FIGURE 7.3: Wheat Yields Permutation Distribution ApproximationsLet Tj be the MannWhitneyWilcoxon statistic using manifest variable j, for j € {1,..., J}. Let T = (Ti,..., Tj). Let S = Var [T] be the variance matrix of this statistic, under the permutation distribution. Let be the element in row j and column j'. The diagonal elements crjj are independent of data values (and equal Mi М2 (М2 + Mi + 1)/12, but that’s not important here). The remaining entries of S depend on the data. For i = 1, ...,Mi + М2, let Fii be the number of observations in group 2 that beat observation i on variable j if i is in group 1, and the number of observations in group 1 that i beats on variable j, if i is in group 2. Then 4/(Mj + Mi) times the covariance matrix for F estimates the variance matrix of T. Kawaguchi et al. (2011) provide details of these calculations. Superior performance can be obtained using known diagonal values, and estimated correlations for the remaining entries of the variance matrix (Chen and Kolassa, 2018). Example 7.5.2 Consider again the wheat yield data of Example 7.5.1. Asymptotic nonparametric testing is performed using library(ICSNP)#For rank.ctest and HotelingsT2. rank.ctest(cbind(Huntsman,Atou)"region) to obtain a pvalue of 0.039, or rank.ctest(cbind(Huntsman,Atou)"region,scores="sign") detach(wheat) to obtain a multivariate version of Mood’s median test. Alternate syntax for rank, ctest consists of calling it with two arguments corresponding to the two data matrices. Exercises1. The data set http://ftp.uni bayreuth.de/math/statlib/datasets/federalistpapers.txt gives data from an analysis of a series of documents. The first column gives document number, the second gives the name of a text file, the third gives a group to which the text is assigned, the fourth represents a measure of the use of first person in the text, the fifth presents a measure of inner thinking, the sixth presents a measure of positivity, and the seventh presents a measure of negativity. There are other columns that you can ignore. (The version at Statlib, above, has odd line breaks. A reformatted version can be found at stat.rutgers.edu/home/kolassa/Data/federalistpapers.txt). a. Test the null hypothesis that the multivariate distribution of first person, inner thinking, positivity, and negativity, are the same between groups 1 and 2, using a permutation test. Test at a = .05. b. Construct new variables, the excess of positivity over negativity, and the excess of thinking ahead over thinking behind, by subtracting variable six minus variable seven, and variable eight minus variable nine. Test the null hypothesis that the multivariate distribution of these two new variables has median zero, versus the general alternative, using the multivariate version of the sign test. Test at a = .05. 2. The data at http://lib.stat.cmu.edu/datasets/cloud contain data from a cloud seeding experiment . The first fifteen lines contain comment and label information; ignore these. The second field contains the character S for a seeded trial, and U for unseeded. a. The fourth and fifth represent rainfalls in two target areas. Test the null hypothesis that the bivariate distribution of observations after seeding is the same as that without seeding. Use the marginal rank sum test. b. Repeat part (a) using the permutation version of Hotelling’s test. 
<<  CONTENTS  >> 

Related topics 