Simple Linear Regression

OpenIntro Statistics · CC BY-SA 3.0 · Chapter 8

Regression fits a line y = b0 + b1*x to data. The least squares method minimizes the sum of squared residuals. R-squared measures the fraction of variance explained. Inference for the slope tells you whether the relationship is real.

Least squares line

The slope b1 = r * (sy / sx) and the intercept b0 = y-bar - b1 * x-bar. These minimize the sum of squared residuals. The correlation r measures linear association; the regression line passes through (x-bar, y-bar).

Scheme

; Least squares regression
; Data: hours studied vs. exam score

(define xs (list 1 2 3 4 5 6 7))
(define ys (list 52 58 65 70 74 80 85))

(define n (length xs))
(define sum-x (apply + xs))
(define sum-y (apply + ys))
(define x-bar (/ sum-x n))
(define y-bar (/ sum-y n))

; Compute slope: b1 = sum((xi - xbar)(yi - ybar)) / sum((xi - xbar)^2)
(define ss-xy (apply + (map (lambda (x y) (* (- x x-bar) (- y y-bar))) xs ys)))
(define ss-xx (apply + (map (lambda (x) (* (- x x-bar) (- x x-bar))) xs)))

(define b1 (/ (exact->inexact ss-xy) (exact->inexact ss-xx)))
(define b0 (- (exact->inexact y-bar) (* b1 (exact->inexact x-bar))))

(display "b0 (intercept) = ") (display b0) (newline)
(display "b1 (slope)     = ") (display b1) (newline)
(display "Line: y = ") (display b0) (display " + ")
(display b1) (display " * x") (newline)
; Predict for x = 5
(display "Predicted y at x=5: ")
(display (+ b0 (* b1 5)))

Residuals

A residual is the observed value minus the predicted value: e = y - y-hat. Residuals should show no pattern when plotted against x or y-hat. Patterns indicate the linear model is missing something: curvature, heteroscedasticity, or outliers.

Scheme

; Computing residuals
(define xs (list 1 2 3 4 5 6 7))
(define ys (list 52 58 65 70 74 80 85))

; Using b0=46.43, b1=5.54 from above
(define b0 46.43)
(define b1 5.54)

(define (predict x) (+ b0 (* b1 x)))

(define y-hats (map predict xs))
(define residuals (map - ys y-hats))

(display "Predicted: ") (display (map (lambda (yh) (exact->inexact yh)) y-hats)) (newline)
(display "Residuals: ") (display (map (lambda (r) (exact->inexact r)) residuals)) (newline)

; Sum of residuals should be ~0
(define sum-resid (apply + residuals))
(display "Sum of residuals: ") (display sum-resid) (newline)

; Sum of squared residuals
(define sse (apply + (map (lambda (e) (* e e)) residuals)))
(display "SSE: ") (display sse)

R-squared

R-squared = 1 - SSE/SST, where SST is the total sum of squares. It measures the proportion of variance in y explained by x. An R-squared of 0.95 means the regression captures 95% of the variability. It equals the square of the correlation coefficient r.

Scheme

; R-squared
(define ys (list 52 58 65 70 74 80 85))
(define n (length ys))
(define y-bar (/ (apply + ys) n))

; SST = total sum of squares
(define sst (apply + (map (lambda (y) (* (- y y-bar) (- y y-bar))) ys)))

; SSE from residuals (b0=46.43, b1=5.54)
(define b0 46.43) (define b1 5.54)
(define xs (list 1 2 3 4 5 6 7))
(define residuals (map (lambda (x y) (- y (+ b0 (* b1 x)))) xs ys))
(define sse (apply + (map (lambda (e) (* e e)) residuals)))

(define r-squared (- 1 (/ sse (exact->inexact sst))))

(display "SST = ") (display (exact->inexact sst)) (newline)
(display "SSE = ") (display sse) (newline)
(display "R^2 = ") (display r-squared) (newline)
(display "The model explains ")
(display (* 100 r-squared))
(display "% of variance")

Inference for the slope

To test H0: beta1 = 0 (no linear relationship), compute t = b1 / SE(b1), where SE(b1) = sqrt(MSE / SS_xx). This follows a t-distribution with n - 2 degrees of freedom. The confidence interval is b1 +/- t* * SE(b1).

Scheme

; Inference for slope
(define xs (list 1 2 3 4 5 6 7))
(define ys (list 52 58 65 70 74 80 85))
(define n 7)
(define b0 46.43) (define b1 5.54)

; SSE and MSE
(define residuals (map (lambda (x y) (- y (+ b0 (* b1 x)))) xs ys))
(define sse (apply + (map (lambda (e) (* e e)) residuals)))
(define mse (/ sse (- n 2)))  ; df = n - 2 = 5

; SS_xx
(define x-bar (/ (apply + xs) n))
(define ss-xx (apply + (map (lambda (x) (* (- x x-bar) (- x x-bar))) xs)))

(define se-b1 (sqrt (/ mse (exact->inexact ss-xx))))
(define t-stat (/ b1 se-b1))

(display "b1     = ") (display b1) (newline)
(display "SE(b1) = ") (display se-b1) (newline)
(display "t      = ") (display t-stat) (newline)
(display "df     = ") (display (- n 2)) (newline)
; Critical t at alpha=0.05, df=5: ~2.571
(display "Reject H0: beta1=0? ")
(display (if (> (abs t-stat) 2.571) "Yes" "No"))

Prediction intervals

A confidence interval targets the mean response at x. A prediction interval targets a single new observation. The prediction interval is wider because it adds individual-level noise on top of estimation uncertainty: SE_pred = sqrt(MSE * (1 + 1/n + (x - x-bar)^2 / SS_xx)).

Scheme

; Prediction interval at x=4
(define n 7) (define b0 46.43) (define b1 5.54)
(define x-new 4)
(define y-hat (+ b0 (* b1 x-new)))

; From earlier: MSE, x-bar, SS_xx
(define mse 2.0)  ; approximate
(define x-bar 4.0)
(define ss-xx 28.0)

; SE for prediction interval
(define se-pred (sqrt (* mse (+ 1 (/ 1 n)
                               (/ (* (- x-new x-bar) (- x-new x-bar)) ss-xx)))))

; t* at alpha=0.05, df=5: ~2.571
(define t-star 2.571)
(define margin (* t-star se-pred))

(display "Predicted y at x=4: ") (display y-hat) (newline)
(display "SE(pred)   = ") (display se-pred) (newline)
(display "95% PI: (") (display (- y-hat margin))
(display ", ") (display (+ y-hat margin)) (display ")")

Notation reference

Symbol	Formula	Meaning
b1	r * sy/sx	Slope of regression line
b0	ȳ - b1*x̄	Intercept
e	y - ŷ	Residual
R²	1 - SSE/SST	Coefficient of determination
SE(b1)	√(MSE/SS_xx)	Standard error of slope

Neighbors

Cross-references

Linear Algebra Ch.1 — solving systems of equations, the algebraic backbone of least squares
🤖 ML Ch.2 — linear regression as a machine learning problem: gradient descent vs. the normal equation
∫ Calculus Ch.12 — least-squares minimization uses the same optimization techniques

Foundations (Wikipedia)

← Inference for Means by june.kim Multiple and Logistic Regression →