Inference for Proportions

OpenIntro Statistics · CC BY-SA 3.0 · Chapter 6

When data are counts or categories, proportions replace means. The Z-test for proportions uses the normal approximation to the binomial. The chi-square test extends this to tables with many categories.

One-proportion Z-test

Test whether a sample proportion p-hat differs from a hypothesized value p0. The test statistic is Z = (p-hat - p0) / sqrt(p0 * (1 - p0) / n). Under the null, Z follows a standard normal distribution when n is large enough (np0 and n(1 - p0) both at least 10).

Scheme

; One-proportion Z-test
; H0: p = 0.5, Ha: p != 0.5
; Sample: 560 successes out of 1000

(define p-hat (/ 560 1000))
(define p0 0.5)
(define n 1000)

(define se (sqrt (/ (* p0 (- 1 p0)) n)))
(define z (/ (- p-hat p0) se))

(display "p-hat = ") (display (exact->inexact p-hat)) (newline)
(display "SE    = ") (display se) (newline)
(display "Z     = ") (display z) (newline)
(display "Reject H0 at alpha=0.05? ")
(display (if (> (abs z) 1.96) "Yes" "No"))

Two-proportion Z-test

Compare proportions from two independent groups. Under the null hypothesis p1 = p2, pool the data to estimate the common proportion, then compute the standard error from the pooled value.

Scheme

; Two-proportion Z-test
; Group 1: 45 out of 100, Group 2: 58 out of 100
; H0: p1 = p2

(define x1 45) (define n1 100)
(define x2 58) (define n2 100)

(define p1 (/ x1 n1))
(define p2 (/ x2 n2))

; Pooled proportion under H0
(define p-pool (/ (+ x1 x2) (+ n1 n2)))

(define se (sqrt (+ (/ (* p-pool (- 1 p-pool)) n1)
                    (/ (* p-pool (- 1 p-pool)) n2))))

(define z (/ (- (exact->inexact p1) (exact->inexact p2)) se))

(display "p1     = ") (display (exact->inexact p1)) (newline)
(display "p2     = ") (display (exact->inexact p2)) (newline)
(display "pooled = ") (display (exact->inexact p-pool)) (newline)
(display "Z      = ") (display z) (newline)
(display "Reject H0 at alpha=0.05? ")
(display (if (> (abs z) 1.96) "Yes" "No"))

Chi-square goodness of fit

Test whether observed counts across k categories match expected proportions. The test statistic sums (observed - expected)^2 / expected across all cells. It follows a chi-square distribution with k - 1 degrees of freedom.

Scheme

; Chi-square goodness of fit
; Are dice rolls uniform? 60 rolls.
; Observed: 8, 12, 10, 14, 7, 9

(define observed (list 8 12 10 14 7 9))
(define expected (list 10 10 10 10 10 10))

; Chi-square = sum of (O - E)^2 / E
(define (chi-sq-term o e)
  (/ (* (- o e) (- o e)) e))

(define chi-sq
  (apply + (map chi-sq-term observed expected)))

(display "Observed: ") (display observed) (newline)
(display "Expected: ") (display expected) (newline)
(display "Chi-sq   = ") (display (exact->inexact chi-sq)) (newline)
(display "df       = 5") (newline)
; Critical value at alpha=0.05, df=5 is 11.07
(display "Reject H0? ")
(display (if (> chi-sq 11.07) "Yes" "No"))

Chi-square test for independence

Test whether two categorical variables are associated. Arrange data in a contingency table, compute expected counts from row and column totals, then apply the chi-square formula. Degrees of freedom = (rows - 1)(cols - 1).

Scheme

; Chi-square test for independence
; 2x2 contingency table:
;              Success  Failure  Total
; Treatment:     40       60      100
; Control:       30       70      100
; Total:         70      130      200

(define n 200)
; Expected = (row total * col total) / n
(define e11 (/ (* 100 70) n))   ; Treatment-Success
(define e12 (/ (* 100 130) n))  ; Treatment-Failure
(define e21 (/ (* 100 70) n))   ; Control-Success
(define e22 (/ (* 100 130) n))  ; Control-Failure

(define (chi-term o e) (/ (* (- o e) (- o e)) e))

(define chi-sq (+ (chi-term 40 e11) (chi-term 60 e12)
                  (chi-term 30 e21) (chi-term 70 e22)))

(display "Expected: ") (display (list e11 e12 e21 e22)) (newline)
(display "Chi-sq = ") (display (exact->inexact chi-sq)) (newline)
(display "df = 1") (newline)
; Critical value at alpha=0.05, df=1 is 3.84
(display "Reject H0? ")
(display (if (> chi-sq 3.84) "Yes" "No"))

Notation reference

Symbol	Formula	Meaning
Z	(p̂ - p0) / SE	Test statistic for proportions
SE	sqrt(p0(1-p0)/n)	Standard error under H0
p̂	x / n	Sample proportion
χ²	Σ(O-E)²/E	Chi-square test statistic
df	(r-1)(c-1)	Degrees of freedom (independence)

Neighbors

Foundations (Wikipedia)

← Foundations for Inference by june.kim Inference for Means →