Sample

Sample Definitions

A sample S is a subset of the population. A sample has n < < N units.
An sample attribute a(𝒮) is an estimate of the population attribute a(𝒫)
$a(\\mathcal S) = \\widehat{a(\\mathcal P)} = a(\\hat{\\mathcal P})$ Sample error is the difference between the sample estimate a(𝒮) and the population attribute a(𝒫) (the estimand). For numerical attributes , sample error is determined mathematically. For graphical attributes, sample error is not determined precisely but it is still conceptually applicable. error = a(𝒮) − a(𝒫) Fisher consistency happens if the sample 𝒮 is equal to the population 𝒫 so the sample error is zero, meaning the estimation is sometimes consistent.

All Possible Samples Definitions

For a population 𝒫 of size N and a sample 𝒮 of size n, there is ${N \choose n}$ possible samples 𝒮 with size n.

Gernerating All Possible Samples

Use the R function combn(…) to generate all of the possible samples of size n from a population of size N

Example 1

For instance, the following example gives all subsets of size 3 from population of {A, B, C, D, E}

combn(LETTERS[1:5],3)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] "A"  "A"  "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  
## [2,] "B"  "B"  "B"  "C"  "C"  "D"  "C"  "C"  "D"  "D"  
## [3,] "C"  "D"  "E"  "D"  "E"  "E"  "D"  "E"  "E"  "E"

Example 2

samples <-combn(Australia,5)
M <-ncol(samples)
head(knitr::kable(data.frame(first = samples[,1],second = samples[,2],
                             third = samples[,3],fourth = samples[,4],
                             fifth = samples[,5],last = samples[, M])))

## [1] "first   second   third   fourth   fifth   last "
## [2] "------  -------  ------  -------  ------  -----"
## [3] "1       1        1       1        1       54   "
## [4] "6       6        6       6        6       55   "
## [5] "7       7        7       7        7       58   "
## [6] "9       9        9       9        9       59   "

print(M)

## [1] 98280

A Population of Attributes

We can calculate any attribute for all of the possible samples of a population.

avesSamp <-apply(samples,MARGIN =2,FUN =function(s) 
  {mean(sharks[s,"Length"])})

We could also plot the results using a histogram.

Sample error

Sample error is calculated as the following $a(\\mathcal S)- a(\\mathcal P) = \\frac{1}{n}\\sum\_{y\\in\\mathcal{S}}y\_u - \\frac{1}{N}\\sum\_{y\\in\\mathcal{P}}y\_u$ If we want to calculate the sample error using R, the following is how we approach it:

sampleErrors <-avesSamp-avePop

Consistency

Sample error depends on sample size:

The sample approaches the population as the the sample size increases
Attributes concentrate more around the population value as the the sample size increases

The consistency for an attribute is shown by the concentration around the true population value in the sample. We use the absolute difference between the sample attribute and the population attribute to quantify this concentration, $\\lvert a(\\mathcal S) -a(\\mathcal P) \\rvert = \\lvert \\frac{1}{n}\\sum\_{y\\in\\mathcal{S}}y\_u - \\frac{1}{N}\\sum\_{y\\in\\mathcal{P}}y\_u \\rvert < c$ for c > 0

Statistical Sampling

Sampling in Statistics