Bootstrap-t Confidence Interval
We want to approximate the sampling distribution of a pivotal quantity with bootstrap distribution so that we could construct a confidence interval. The method is similar to approximating a sampling distribution of a pivotal quantity using a t-distribution.
The Step-By-Step Approach
-
Given a sample ๐ฎ, an attribute a(๐ฎ), and standard error \(\\widehat {SD}\[\\tilde a(\\mathcal S)\]\), calculate a(๐ฎ) and standard error \(\\widehat {SD}\[\\tilde a(\\mathcal S)\]\) based on the sample.
-
Generate B bootstrap samples s1*,โโฆ,โsB* from sample s with replacement.
-
For each of the B bootstrap samples, calculate a(๐ฎB*) and error \(\\widehat {SD}\[\\tilde a(\\mathcal S\_B^\*)\]\). Find the value of \(z\_B^\*=\\frac{a(\\mathcal S\_B^\*)-a(\\mathcal S)}{\\widehat {SD}\[\\tilde a(\\mathcal S\_B^\*)\]}\)
-
From the estimates of the sampling distribution z1*,โโฆ,โzB*, find the quantiles clowerโ=โQz(p/2) and cupperโ=โQz(1โ โโ p/2).
-
The 100(1โ โโ p)% bootstrap-t confidence interval is \(\\left (a(\\mathcal S)-c\_{upper} \\widehat {SD}\[\\tilde a(\\mathcal S)\], (a(\\mathcal S)-c\_{lower} \\widehat {SD}\[\\tilde a(\\mathcal S)\] \\right )\)
- Note the positions and signs of the clower and cupper in the above definition
Casualty Age Example
In this example, when a(๐ซ) is average age of the casualty in shark encounters, the sampling distribution of the pivotal quantity with nโ=โ8 is shown below.
First, letโs define some functions to help our future computation.
casualty <- read.csv("sharks.csv")
popCasualty <- rownames(casualty)
# Put n or N in the denominator of SD
# instead of n-1 or N-1
sdn <- function( y.pop ) {
N = length(y.pop)
sqrt( var(y.pop)*(N-1)/(N) ) }
se.avg <- function(y.sam) {
sdn(y.sam)/sqrt(length(y.sam)) * sqrt((65-length(y.sam))/(65-1) )
}
popSize <- function(pop) {
if (is.vector(pop))
{if (is.logical(pop))
## Assume TRUE values
sum(pop) else length(pop)}
else nrow(pop)
}
getSample <- function(pop, size, replace=FALSE) {
N <- popSize(pop)
pop[sample(1:N, size, replace = replace)]
}
We now generate our sampling distribution and the bootstrap estimates of the sampling distribution. Then, we can also calculate a(๐ฎ), $\widehat {SD}[\tilde a(\mathcal S)]$, a(๐ฎB*), $\widehat {SD}[\tilde a(\mathcal S_B^*)]$ and zB*.
M <- 10^5
n = 8
set.seed(3421134)
samples <- sapply(1:M, FUN =function(m) sample(popCasualty, n, replace = FALSE) )
avePop <- mean(casualty[, "Age"])
avesSamp <- apply(samples, MARGIN = 2,
FUN = function(s){mean(casualty[s,"Age"])})
SEaveSamp <- apply(samples, MARGIN = 2,
FUN = function(s){se.avg(casualty[s,"Age"])})
ZPop <- (avesSamp - mean(casualty[,"Age"]))/SEaveSamp
samCasualty <- sample(popCasualty, n, replace = FALSE)
samStar <- sapply(1:M, FUN = function(m) sample(samCasualty, n, replace = TRUE))
aveSam <- mean(casualty[samCasualty, "Age"])
avesBoot <- apply(samStar, MARGIN = 2, FUN = function(s) {
mean(casualty[s, "Age"])
})
SEaveBoot <- apply(samples, MARGIN = 2, FUN = function(s) {
se.avg(casualty[s, "Age"])
})
ZBoot <- (avesBoot - aveSam)/SEaveSamp
We have our answers. The Bootstrap-t confidence interval using $\widehat {SD}[\tilde a(\mathcal S)]$ for the average age of the casualty is
samCasualtyAges = casualty[samCasualty, "Age"]
zStar.lower = quantile(ZBoot, 0.025)
zStar.upper = quantile(ZBoot, 0.975)
round(mean(samCasualtyAges) - c(zStar.upper, zStar.lower) * se.avg(samCasualtyAges),
2)
## 97.5% 2.5%
## 19.69 40.94
And the Bootstrap-t confidence interval using $\widehat {SD}_*[\tilde a(\mathcal S)]$ for the average casualty age is
round(mean(samCasualtyAges) - quantile(ZBoot, c(0.975, 0.025)) * sd(avesBoot),
2)
## 97.5% 2.5%
## 19.04 41.55