Home Mathematics



The JackknifeTable of Contents:
Consider a related technique, the jackknife (Quenouille, 1956; Miller, 1974), a technique to estimate the bias of an estimator, using values of the estimator evaluated on subsets of the observed sample, under certain conditions on the form of the bias. Suppose that the bias of estimator T_{n} of в based on n observations is
for some unknown a. Here 0(n^{3}/^{2}) denotes a quantity which, after multiplication by n^{3}/^{2}, is bounded for all u, Examples of Biases of the Proper OrderQuenouille (1956) suggested this technique in situations where (10.10) held with at least one more term of form b/n^{2}, and with a correspondingly smaller error. That is, (10.10) is replaced with
For example, if X,..., X_{n} are independent, and identically distributed with distribution (//,, a^{2}), the maximum likelihood estimator for a^{2} is a^{2} = 52j=i(Xj — X)^{2}/n. Note that cr^{2} = no^{2}j(n — 1) is an unbiased estimator of a^{2}, since E_{a}^{2} [d^{2}] = (n— 1) Restriction (10.11) seems unnecessarily strict, and cases in which it holds are less common. For example, let W_{n} be an unbiased estimate of a parameter w, and let T_{n} = g(W_{n}), for some smooth function g. Let в = д(ш), and assume that Var [IF,,] « a/n. Then
and
Then (10.10) holds, and (10.11) holds if the skewness of W_{n} times Jn converges to 0. Bias CorrectionLet _{i} be the estimator based on the sample of size n — 1 with observation i omitted. Let T* = X!"=i Then the bias of T* is approximately a/(n — 1). Under (10.10), В = (n — 1)(T)* — T_{n}) estimates the bias, since
Then the bias of T_{n} — В is 0(n^{3}/^{2}). Furthermore, under the more stringent requirement (10.11), the bias is ()(n~^{2}). Correcting the Bias in Mean EstimatorsSample means are always unbiased estimators of the population expectation. Consider application of the jackknife to the sample mean. In this case. T_{n} = E”=1 Xj/n, T*__{u} = ZUj# ^{X}^(^{n} ~ !)• ^{and}
and hence the bias estimate is zero. Correcting the Bias in Quantile EstimatorsConsider the jackknife bias estimate for the median for continuous data, and, for the sake of defining the Т*__{г} j, let г index the order statistic X^. When the sample size n is even, then T_{n} = (X(_{ra}/2) + X(_{n}/2  i))/2. and Then T*__{L} = (X(_{n}/2) + T(n/2+i))/“ = T„, and the bias estimate is always 0. For n odd, then T_{n} = ^((_{n}+i)/2)i and
and the average of results from the smaller sample is Hence
and the bias estimate is Example 10.4.1 Consider again the nail arsenic data of Example 2.3.2. Calculate the Jackknife estimate of bias for the mean, median, and trimmed mean for these data. Jackknifing is done using the bootstrap library, using the function jackknife. library(bootstrap)#gives jackknife jackknife(arsenic$nails,median) This function produces the 21 values, each with one observation omitted: $jack.values [1] 0.2220 0.2220 0.2220 0.2220 0.1665 0.1665 0.2220 0.2220 [9] 0.1665 0.2220 0.2220 0.1665 0.1665 0.1665 0.1665 0.1665 [17] 0.1665 0.2220 0.1665 0.2220 0.2135 Each of these values is the median of 20 observations. Ten of them, corresponding to the omission of the lowest ten values, are the averages of Х/щ and Х(цу Ten of them, corresponding to the omission of the highest ten values, are the averages of X^ andX^y The last, corresponding to the omission of the middle value, is the average of X(9) and Xyiy The mean of the jackknife observations is 0.1952. The sample median is 0.1750, and the bias adjustment is 20 x (0.1952 — 0.1750) = 20 x 0.0202 = 0.404, as is given by R: $jack.bias [1] 0.4033333 This bias estimate for the median seems remarkably large. From, (10.12) the jackknife bias estimate for the median is governed by the difference between the middle value and the average of its neighbors. This data set features an unusually large gap between the middle observation and the one above it. Applying the jackknife to the mean via jackknife(arsenic$nails,mean) gives the bias correction 0: $jack.bias [1] 0 as predicted above. One can also jackknife the trimmed mean: jackknife(arsenic$nails,mean,trim=0.25) The above mean is defined as the mean of the middle half of the data, with 0.25 proportion trimmed from each end. Although the conventional mean is unbiased, this unbiasedness does not extend to the trimmed mean: $jack.bias [1] 0.02450216 In contrast to the arsenic example with an odd number of data points, consider applying the jackknife to the median of the ten brain volume differences, from Example 5.2.1: attach(brainpairs);jackknife(diff,median) to give 0 as the bias correction. The average effects of such corrections may be investigated via simulation. Table 10.2 contains the results of a simulation based on 100,000 random samples for data sets of size 11. In this exponential case, the jackknife bias correction overcorrects the median, but appears to address the trimmed mean exactly. TABLE 10.2: Expectations of statistic and Jackknife bias estimate
Under some more restrictive conditions, one can also use this idea to estimate the variance of T. Exercises1. The data set http: / / ftp.unibayreuth.de / math/statlib/datasets/lupus gives data on 87 lupus patients. The fourth column gives transformed disease duration. a. Give a 90% bootstrap confidence interval for the mean transformed disease duration, using the basic, Studentized, and BCa approaches. b. Give a jackknife estimate of the bias of the mean and of the 0.25 trimmed mean transformed disease duration (that is, the sample average of the middle half of the transformed disease duration). 2. The data set http://ftp.uni bayreuth.de/math/statlib/datasets/federalistpapers.txt gives data from an analysis of a series of documents. The first column gives document number, the second gives the name of a text file, the third gives a group to which the text is assigned, the fourth represents a measure of the use of first person in the text, and the fifth presents a measure of inner thinking. There are other columns that you can ignore. (The version at Statlib, above, has odd line breaks. A reformatted version can be found at stat.rutgers.edu/home/kolassa/Data/federalistpapers.txt). a. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Test at a = .05. Provide basic, Studentized, and BCa intervals. Do the fixedX bootstrap. b. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Provide basic, Studentized, and BCa intervals. Do not do the fixedX bootstrap; resample pairs of data. c. Calculate a bootstrap confidence interval, with confidence level .95, for the R^{2} statistic for inner thinking regressed on first person. Provide basic and BCa intervals. Do not do the fixedX bootstrap; resample pairs of data. А 
<<  CONTENTS  >> 

Related topics 