Multiple and Logistic Regression

OpenIntro Statistics · CC BY-SA 3.0 · Chapter 9

Multiple regression uses several predictors to model a continuous outcome. Logistic regression models a binary outcome using the sigmoid function. Both extend simple regression: one by adding predictors, the other by changing the link function.

Multiple predictors

The model y = b0 + b1*x1 + b2*x2 + ... + bp*xp uses p predictors. Each coefficient bi represents the expected change in y for a one-unit increase in xi, holding all other predictors constant. This "holding constant" interpretation is the difference from running p separate simple regressions.

Scheme

; Multiple regression: y = b0 + b1*x1 + b2*x2
; Predicting exam score from hours studied (x1)
; and hours slept (x2)
; Coefficients (pre-computed): b0=30, b1=5, b2=3

(define b0 30)
(define b1 5)
(define b2 3)

(define (predict x1 x2) (+ b0 (* b1 x1) (* b2 x2)))

; Student A: 6 hours study, 8 hours sleep
(display "Student A (6h study, 8h sleep): ")
(display (predict 6 8)) (newline)

; Student B: 8 hours study, 6 hours sleep
(display "Student B (8h study, 6h sleep): ")
(display (predict 8 6)) (newline)

; Effect of one more hour studying (holding sleep constant)
(display "Extra hour studying: +")
(display b1) (display " points") (newline)

; Effect of one more hour sleeping (holding study constant)
(display "Extra hour sleeping: +")
(display b2) (display " points")

Adjusted R-squared

R-squared always increases when you add a predictor, even a useless one. Adjusted R-squared penalizes for the number of predictors: R-adj = 1 - (SSE/(n-p-1)) / (SST/(n-1)). It only increases when the new predictor improves the model more than chance would predict.

Scheme

; Adjusted R-squared
; Compares R^2 with and without penalty for extra predictors

(define n 50)        ; sample size
(define sst 1000.0)  ; total SS

; Model 1: 1 predictor, SSE = 300
(define p1 1) (define sse1 300.0)
(define r2-1 (- 1 (/ sse1 sst)))
(define r2-adj-1 (- 1 (/ (/ sse1 (- n p1 1)) (/ sst (- n 1)))))

; Model 2: 3 predictors, SSE = 280
(define p2 3) (define sse2 280.0)
(define r2-2 (- 1 (/ sse2 sst)))
(define r2-adj-2 (- 1 (/ (/ sse2 (- n p2 1)) (/ sst (- n 1)))))

(display "Model 1 (1 predictor):") (newline)
(display "  R^2     = ") (display r2-1) (newline)
(display "  R^2_adj = ") (display r2-adj-1) (newline)
(display "Model 2 (3 predictors):") (newline)
(display "  R^2     = ") (display r2-2) (newline)
(display "  R^2_adj = ") (display r2-adj-2) (newline)
(display "Extra predictors worth it? ")
(display (if (> r2-adj-2 r2-adj-1) "Yes" "No"))

Multicollinearity

When predictors are correlated with each other, individual coefficients become unstable: large standard errors, sign flips, sensitivity to which observations are included. The variance inflation factor (VIF) quantifies this: VIF = 1 / (1 - R^2_j), where R^2_j is from regressing predictor j on all other predictors. VIF above 5-10 signals trouble.

Scheme

; Variance Inflation Factor
; VIF = 1 / (1 - R^2_j)
; R^2_j = how well predictor j is predicted by other predictors

; If x1 and x2 have correlation r = 0.9:
; R^2 of x1 ~ x2 is roughly r^2 = 0.81

(define (vif r-squared) (/ 1 (- 1 r-squared)))

(display "Correlation between predictors -> VIF:") (newline)
(display "  r = 0.0:  VIF = ") (display (vif 0.00)) (newline)
(display "  r = 0.5:  VIF = ") (display (vif 0.25)) (newline)
(display "  r = 0.7:  VIF = ") (display (vif 0.49)) (newline)
(display "  r = 0.9:  VIF = ") (display (vif 0.81)) (newline)
(display "  r = 0.95: VIF = ") (display (vif 0.9025)) (newline)
(display "Rule of thumb: VIF > 5 is concerning")

Logistic regression

For a binary outcome (0 or 1), linear regression can predict probabilities outside 0-1. Logistic regression fixes this by modeling the log-odds: log(p/(1-p)) = b0 + b1*x. Solving for p gives the sigmoid: p = 1 / (1 + e^-(b0+b1*x)). The coefficients represent changes in log-odds, not in probability.

Scheme

; Logistic regression: the sigmoid function
; log(p / (1-p)) = b0 + b1*x
; p = 1 / (1 + e^(-(b0 + b1*x)))

(define b0 -4)
(define b1 0.8)

(define (sigmoid z) (/ 1 (+ 1 (exp (- z)))))
(define (predict-prob x) (sigmoid (+ b0 (* b1 x))))

(display "P(y=1) at different x values:") (newline)
(define test-xs (list 0 2 4 5 6 8 10))
(for-each (lambda (x)
  (display "  x = ") (display x)
  (display " -> p = ") (display (predict-prob x))
  (newline))
  test-xs)

; At x=5, the log-odds = b0 + b1*5 = -4 + 4 = 0
; So p = 0.5: this is the decision boundary
(display "Decision boundary at x = ")
(display (/ (- b0) b1))

Odds ratios

The odds of an event is p/(1-p). In logistic regression, e^b1 is the odds ratio: the factor by which the odds multiply for each one-unit increase in x. An odds ratio of 2.23 means the odds more than double per unit increase. This is the natural scale for interpreting logistic coefficients.

Scheme

; Odds ratios
; If b1 = 0.8, then odds ratio = e^0.8

(define b1 0.8)
(define odds-ratio (exp b1))

(display "b1 = ") (display b1) (newline)
(display "Odds ratio = e^b1 = ") (display odds-ratio) (newline)
(display "Interpretation: each unit increase in x") (newline)
(display "  multiplies the odds by ") (display odds-ratio) (newline)

; Example: if baseline odds = 1:3 (p=0.25)
(define baseline-odds (/ 1 3))
(display "Baseline odds: ") (display (exact->inexact baseline-odds)) (newline)
(define new-odds (* baseline-odds odds-ratio))
(display "After x+1:     ") (display new-odds) (newline)
; Convert odds to probability
(define (odds->prob odds) (/ odds (+ 1 odds)))
(display "Baseline prob:  ") (display (odds->prob baseline-odds)) (newline)
(display "New prob:       ") (display (odds->prob new-odds))

Notation reference

Symbol	Formula	Meaning
R²_adj	1-(SSE/(n-p-1))/(SST/(n-1))	Adjusted R-squared
VIF	1/(1-R²_j)	Variance inflation factor
logit(p)	log(p/(1-p))	Log-odds (link function)
σ(z)	1/(1+e⁻ᶻ)	Sigmoid function
OR	eᵇ¹	Odds ratio

Neighbors

Cross-references

Cogsci Ch.4 — neural networks use logistic (sigmoid) activation functions

Foundations (Wikipedia)

← Simple Linear Regression by june.kim