Sample
Sample Definitions
A sample S is a subset of the population. A sample has
n < < N units.
An sample attribute a(𝒮) is an estimate of the population
attribute a(𝒫)
\(a(\\mathcal S) = \\widehat{a(\\mathcal P)} = a(\\hat{\\mathcal P})\)
Sample error is the difference between the sample estimate a(𝒮)
and the population attribute a(𝒫) (the estimand). For numerical
attributes , sample error is determined mathematically. For graphical
attributes, sample error is not determined precisely but it is still
conceptually applicable.
error = a(𝒮) − a(𝒫)
Fisher consistency happens if the sample 𝒮 is equal to the
population 𝒫 so the sample error is zero, meaning the estimation is
sometimes consistent.
All Possible Samples Definitions
For a population 𝒫 of size N and a sample 𝒮 of size n, there is ${N \choose n}$ possible samples 𝒮 with size n.
Gernerating All Possible Samples
Use the R function combn(…) to generate all of the possible samples of size n from a population of size N
Example 1
For instance, the following example gives all subsets of size 3 from population of {A, B, C, D, E}
combn(LETTERS[1:5],3)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] "A" "A" "A" "A" "A" "A" "B" "B" "B" "C"
## [2,] "B" "B" "B" "C" "C" "D" "C" "C" "D" "D"
## [3,] "C" "D" "E" "D" "E" "E" "D" "E" "E" "E"
Example 2
samples <-combn(Australia,5)
M <-ncol(samples)
head(knitr::kable(data.frame(first = samples[,1],second = samples[,2],
third = samples[,3],fourth = samples[,4],
fifth = samples[,5],last = samples[, M])))
## [1] "first second third fourth fifth last "
## [2] "------ ------- ------ ------- ------ -----"
## [3] "1 1 1 1 1 54 "
## [4] "6 6 6 6 6 55 "
## [5] "7 7 7 7 7 58 "
## [6] "9 9 9 9 9 59 "
print(M)
## [1] 98280
A Population of Attributes
We can calculate any attribute for all of the possible samples of a population.
avesSamp <-apply(samples,MARGIN =2,FUN =function(s)
{mean(sharks[s,"Length"])})
We could also plot the results using a histogram.
Sample error
Sample error is calculated as the following \(a(\\mathcal S)- a(\\mathcal P) = \\frac{1}{n}\\sum\_{y\\in\\mathcal{S}}y\_u - \\frac{1}{N}\\sum\_{y\\in\\mathcal{P}}y\_u\) If we want to calculate the sample error using R, the following is how we approach it:
sampleErrors <-avesSamp-avePop
Consistency
Sample error depends on sample size:
-
The sample approaches the population as the the sample size increases
-
Attributes concentrate more around the population value as the the sample size increases
The consistency for an attribute is shown by the concentration around the true population value in the sample. We use the absolute difference between the sample attribute and the population attribute to quantify this concentration, \(\\lvert a(\\mathcal S) -a(\\mathcal P) \\rvert = \\lvert \\frac{1}{n}\\sum\_{y\\in\\mathcal{S}}y\_u - \\frac{1}{N}\\sum\_{y\\in\\mathcal{P}}y\_u \\rvert < c\) for c > 0