Table of Contents:
Consider a related technique, the jackknife (Quenouille, 1956; Miller, 1974), a technique to estimate the bias of an estimator, using values of the estimator evaluated on subsets of the observed sample, under certain conditions on the form of the bias. Suppose that the bias of estimator Tn of в based on n observations is
for some unknown a. Here 0(n-3/2) denotes a quantity which, after multiplication by n3/2, is bounded for all u,
Examples of Biases of the Proper Order
Quenouille (1956) suggested this technique in situations where (10.10) held with at least one more term of form b/n2, and with a correspondingly smaller error. That is, (10.10) is replaced with
For example, if X,..., Xn are independent, and identically distributed with distribution >(//,, a2), the maximum likelihood estimator for a2 is a2 = 52j=i(Xj — X)2/n. Note that cr2 = no2j(n — 1) is an unbiased estimator of a2, since Ea2 [d2] = (n— 1)
Restriction (10.11) seems unnecessarily strict, and cases in which it holds are less common. For example, let Wn be an unbiased estimate of a parameter w, and let Tn = g(Wn), for some smooth function g. Let в = д(ш), and assume that Var [IF,,] « a/n. Then
Then (10.10) holds, and (10.11) holds if the skewness of Wn times Jn converges to 0.
Let i be the estimator based on the sample of size n — 1 with observation i omitted. Let T* = X!"=i Then the bias of T* is approximately
a/(n — 1). Under (10.10), В = (n — 1)(T)* — Tn) estimates the bias, since
Then the bias of Tn — В is 0(n-3/2). Furthermore, under the more stringent requirement (10.11), the bias is ()(n~2).
Correcting the Bias in Mean Estimators
Sample means are always unbiased estimators of the population expectation. Consider application of the jackknife to the sample mean. In this case. Tn =
E”=1 Xj/n, T*_u = ZUj# X^(n ~ !)• and
and hence the bias estimate is zero.
Correcting the Bias in Quantile Estimators
Consider the jackknife bias estimate for the median for continuous data, and, for the sake of defining the Т*_г j, let г index the order statistic X^.
When the sample size n is even, then Tn = (X(ra/2) + X(n/2 - i))/2. and
Then T*_L = (X(n/2) + -T(n/2+i))/“ = T„, and the bias estimate is always 0. For n odd, then Tn = ^((n+i)/2)i and
and the average of results from the smaller sample is Hence
and the bias estimate is
Example 10.4.1 Consider again the nail arsenic data of Example 2.3.2. Calculate the Jackknife estimate of bias for the mean, median, and trimmed mean for these data. Jackknifing is done using the bootstrap library, using the function jackknife.
library(bootstrap)#gives jackknife jackknife(arsenic$nails,median)
This function produces the 21 values, each with one observation omitted: $jack.values
 0.2220 0.2220 0.2220 0.2220 0.1665 0.1665 0.2220 0.2220  0.1665 0.2220 0.2220 0.1665 0.1665 0.1665 0.1665 0.1665  0.1665 0.2220 0.1665 0.2220 0.2135
Each of these values is the median of 20 observations. Ten of them, corresponding to the omission of the lowest ten values, are the averages of Х/щ and Х(цу Ten of them, corresponding to the omission of the highest ten values, are the averages of X^ andX^y The last, corresponding to the omission of the middle value, is the average of X(9) and Xyiy The mean of the jackknife observations is 0.1952. The sample median is 0.1750, and the bias adjustment is 20 x (0.1952 — 0.1750) = 20 x 0.0202 = 0.404, as is given by R:
This bias estimate for the median seems remarkably large. From, (10.12) the jackknife bias estimate for the median is governed by the difference between the middle value and the average of its neighbors. This data set features an unusually large gap between the middle observation and the one above it.
Applying the jackknife to the mean via jackknife(arsenic$nails,mean)
gives the bias correction 0:
as predicted above. One can also jackknife the trimmed mean: jackknife(arsenic$nails,mean,trim=0.25)
The above mean is defined as the mean of the middle half of the data, with 0.25 proportion trimmed from each end. Although the conventional mean is unbiased, this unbiasedness does not extend to the trimmed mean:
$jack.bias  -0.02450216
In contrast to the arsenic example with an odd number of data points, consider applying the jackknife to the median of the ten brain volume differences, from Example 5.2.1:
to give 0 as the bias correction.
The average effects of such corrections may be investigated via simulation. Table 10.2 contains the results of a simulation based on 100,000 random samples for data sets of size 11. In this exponential case, the jackknife bias correction over-corrects the median, but appears to address the trimmed mean exactly.
TABLE 10.2: Expectations of statistic and Jackknife bias estimate
Under some more restrictive conditions, one can also use this idea to estimate the variance of T.
1. The data set
http: / / ftp.uni-bayreuth.de / math/statlib/datasets/lupus
gives data on 87 lupus patients. The fourth column gives transformed disease duration.
a. Give a 90% bootstrap confidence interval for the mean transformed disease duration, using the basic, Studentized, and BCa approaches.
b. Give a jackknife estimate of the bias of the mean and of the 0.25 trimmed mean transformed disease duration (that is, the sample average of the middle half of the transformed disease duration).
2. The data set
gives data from an analysis of a series of documents. The first column gives document number, the second gives the name of a text file, the third gives a group to which the text is assigned, the fourth represents a measure of the use of first person in the text, and the fifth presents a measure of inner thinking. There are other columns that you can ignore. (The version at Statlib, above, has odd line breaks. A reformatted version can be found at
a. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Test at a = .05. Provide basic, Studentized, and BCa intervals. Do the fixed-X bootstrap.
b. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Provide basic, Studentized, and BCa intervals. Do not do the fixed-X bootstrap; re-sample pairs of data.
c. Calculate a bootstrap confidence interval, with confidence level .95, for the R2 statistic for inner thinking regressed on first person. Provide basic and BCa intervals. Do not do the fixed-X bootstrap; re-sample pairs of data.