Life of Chang🪴

jotting down my reflections and learnings along the way

An Analysis of Superbowl TV Shows

1. TV, halftime shows, and the Big Game # Import pandas import pandas as pd # Load the CSV data into DataFrames super_bowls = pd.read_csv('datasets/super_bowls.csv') tv = pd.read_csv('datasets/tv.csv') halftime_musicians = pd.read_csv('datasets/halftime_musicians.csv') # Display the first five rows of each DataFrame display(super_bowls.head()) display(tv.head()) display(halftime_musicians.head()) .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } ...

10 min · 2036 words · Chang Liu

Anatomy of a Significance Test

Anatomy of a Significance Test The Goal We want to test the difference between attributes of 2 sub-populations relative to randomly mixed sub-populations and provide numerical evidence. The Null Hypothesis The following equivalent statements are the null hypothesis, H0 that we are testing. H0:The sub-populations P1 and P2 were randomly draw from the same population H0:The sub-populations P1 and P2 were created randomly by assigning units in the same population to each of P1 and P2 H0:The sub-populations P1 and P2 were randomly generated. Note that that H0 is weaker to be stated in the form of a(𝒫1) = a(𝒫2), although still correct. That’s why we shouldn’t state H0 in terms of equivalence of attribute value. ...

5 min · 1061 words · Chang Liu

Bootstrap-t Confidence Interval

Bootstrap-t Confidence Interval We want to approximate the sampling distribution of a pivotal quantity with bootstrap distribution so that we could construct a confidence interval. The method is similar to approximating a sampling distribution of a pivotal quantity using a t-distribution. The Step-By-Step Approach Given a sample 𝒮, an attribute a(𝒮), and standard error $$\widehat {SD}[\tilde a(\mathcal S)]$$, calculate a(𝒮) and standard error $$\widehat {SD}[\tilde a(\mathcal S)]$$ based on the sample. ...

3 min · 490 words · Chang Liu

Grade 5 Students in California

Stardardized Test Scores of Grade 5 Students in California Describing the Data The data contains the average standardized test scores of grade 5 students in each school in California in the school year of 1998 through 1999. On top of the scores, information are collected on areas such as the number of students enrolled in the school, number of computers per classroom, the percentage of students in school that qualify for a reduced price lunch, and the percentage of students whose first language is not English in the school. ...

5 min · 864 words · Chang Liu

Statistical Sampling

Sample Sample Definitions A sample S is a subset of the population. A sample has n < < N units. An sample attribute a(𝒮) is an estimate of the population attribute a(𝒫) $$ a(\mathcal S) = \widehat{a(\mathcal P)} = a(\hat{\mathcal P})$$ Sample error is the difference between the sample estimate a(𝒮) and the population attribute a(𝒫) (the estimand). For numerical attributes , sample error is determined mathematically. For graphical attributes, sample error is not determined precisely but it is still conceptually applicable. error = a(𝒮) − a(𝒫) Fisher consistency happens if the sample 𝒮 is equal to the population 𝒫 so the sample error is zero, meaning the estimation is sometimes consistent. ...

3 min · 466 words · Chang Liu