Statistical inference draws conclusions about a population from a sample. Two main tools: confidence intervals estimate a parameter with a margin of error, and hypothesis tests evaluate whether evidence supports a claim. Both rest on the sampling distribution.
Point estimates and the sampling distribution
A point estimate is a single number computed from data (like the sample mean). Different samples give different estimates. The distribution of all possible estimates is the sampling distribution. By the 🎰 Central Limit Theorem, sample means are approximately normal for large n.
Scheme
; Point estimate: sample mean as estimate of population mean; Standard error: SD / sqrt(n)
(define sample '(67727468716973707566))
(define (mean lst)
(exact->inexact (/ (apply + lst) (length lst))))
(define (std lst)
(let ((m (mean lst)))
(sqrt (/ (apply + (map (lambda (x) (* (- x m) (- x m))) lst))
(- (length lst) 1)))))
(define x-bar (mean sample))
(define s (std sample))
(define n (length sample))
(define se (/ s (sqrt n)))
(display "Sample mean: ") (display x-bar) (newline)
(display "Sample SD: ") (display s) (newline)
(display "Sample size: ") (display n) (newline)
(display "Std error: ") (display se)
Confidence intervals
A 95% confidence interval is: point estimate ± z* × SE, where z* = 1.96 for 95% confidence. Interpretation: if we repeated the sampling many times, 95% of intervals would contain the true parameter.
Scheme
; 95% confidence interval for the mean; CI = x-bar +/- z* * SE
(define sample '(67727468716973707566))
(define (mean lst)
(exact->inexact (/ (apply + lst) (length lst))))
(define (std lst)
(let ((m (mean lst)))
(sqrt (/ (apply + (map (lambda (x) (* (- x m) (- x m))) lst))
(- (length lst) 1)))))
(define x-bar (mean sample))
(define se (/ (std sample) (sqrt (length sample))))
(define z-star 1.96) ; for 95% confidence
(define margin (* z-star se))
(define ci-lower (- x-bar margin))
(define ci-upper (+ x-bar margin))
(display "Point estimate: ") (display x-bar) (newline)
(display "Margin of error: ") (display margin) (newline)
(display "95% CI: (")
(display ci-lower) (display ", ")
(display ci-upper) (display ")") (newline)
(newline)
(display "We are 95% confident the population mean") (newline)
(display "falls between ") (display ci-lower)
(display " and ") (display ci-upper)
Python
# Confidence interval in Pythonimportmath
sample = [67, 72, 74, 68, 71, 69, 73, 70, 75, 66]
n = len(sample)
x_bar = sum(sample) / n
s = math.sqrt(sum((x - x_bar)**2for x in sample) / (n - 1))
se = s / math.sqrt(n)
z_star = 1.96# 95% confidence
margin = z_star * se
print(f"Point estimate: {x_bar:.1f}")
print(f"Margin of error: {margin:.2f}")
print(f"95% CI: ({x_bar - margin:.2f}, {x_bar + margin:.2f})")
Hypothesis testing and p-values
Start with a null hypothesis H₀ (no effect). Compute a test statistic and its p-value: the probability of seeing data this extreme if H₀ is true. If p < α (typically 0.05), reject H₀.
Scheme
; Hypothesis test: is the population mean = 70?; H0: mu = 70; HA: mu != 70 (two-tailed)
(define sample '(67727468716973707566))
(define mu-0 70) ; null hypothesis value
(define (mean lst)
(exact->inexact (/ (apply + lst) (length lst))))
(define (std lst)
(let ((m (mean lst)))
(sqrt (/ (apply + (map (lambda (x) (* (- x m) (- x m))) lst))
(- (length lst) 1)))))
(define x-bar (mean sample))
(define se (/ (std sample) (sqrt (length sample))))
; Test statistic: Z = (x-bar - mu0) / SE
(define z-stat (/ (- x-bar mu-0) se))
(display "Sample mean: ") (display x-bar) (newline)
(display "Null value: ") (display mu-0) (newline)
(display "Std error: ") (display se) (newline)
(display "Z-statistic: ") (display z-stat) (newline)
(newline)
; Decision at alpha = 0.05; Reject if |Z| > 1.96
(define alpha 0.05)
(display "Critical value: 1.96") (newline)
(display "|Z| = ") (display (abs z-stat)) (newline)
(display "Reject H0? ")
(display (if (> (abs z-stat) 1.96) "Yes""No"))
Significance level and errors
The significance level α is the threshold for rejecting H₀. Type I error: rejecting H₀ when it is true (probability = α). Type II error: failing to reject H₀ when it is false (probability = β). Power = 1 - β: the probability of correctly detecting a real effect.
Scheme
; Error types in hypothesis testing; H0 true H0 false; Reject H0: Type I (a) Correct (power); Fail to reject: Correct Type II (b)
(define alpha 0.05)
(display "Significance level (alpha): ") (display alpha) (newline)
(display "P(Type I error) = alpha = ") (display alpha) (newline)
(newline)
; If the true effect is small, power is low; Power increases with: larger sample, larger effect, larger alpha
(define beta 0.20) ; typical target
(define power (- 1 beta))
(display "P(Type II error) = beta = ") (display beta) (newline)
(display "Power = 1 - beta = ") (display power) (newline)
(newline)
; Trade-off: lowering alpha reduces Type I errors; but increases Type II errors (lowers power)
(display "Lower alpha -> fewer false positives,") (newline)
(display " -> more false negatives")
🔬 Fisher 1935 — the inventor of significance testing and p-values
🔬 Ioannidis 2005 — why the tools in this chapter are often misused
🤖 ML Ch.1 — statistical learning as inference from data
Translation notes
OpenIntro builds inference from the sampling distribution of the sample proportion, then generalizes. We use the sample mean for clarity. The original covers decision errors with medical testing analogies and uses simulation to build intuition for sampling distributions. For large samples, the Z-test is equivalent to the t-test. Small-sample inference uses the t-distribution (covered in Ch. 7).