← back to statistics

Simple Linear Regression

OpenIntro Statistics · CC BY-SA 3.0 · Chapter 8

Regression fits a line y = b0 + b1*x to data. The least squares method minimizes the sum of squared residuals. R-squared measures the fraction of variance explained. Inference for the slope tells you whether the relationship is real.

x y fitted line residuals

Least squares line

The slope b1 = r * (sy / sx) and the intercept b0 = y-bar - b1 * x-bar. These minimize the sum of squared residuals. The correlation r measures linear association; the regression line passes through (x-bar, y-bar).

Scheme

Residuals

A residual is the observed value minus the predicted value: e = y - y-hat. Residuals should show no pattern when plotted against x or y-hat. Patterns indicate the linear model is missing something: curvature, heteroscedasticity, or outliers.

Scheme

R-squared

R-squared = 1 - SSE/SST, where SST is the total sum of squares. It measures the proportion of variance in y explained by x. An R-squared of 0.95 means the regression captures 95% of the variability. It equals the square of the correlation coefficient r.

Scheme

Inference for the slope

To test H0: beta1 = 0 (no linear relationship), compute t = b1 / SE(b1), where SE(b1) = sqrt(MSE / SS_xx). This follows a t-distribution with n - 2 degrees of freedom. The confidence interval is b1 +/- t* * SE(b1).

Scheme

Prediction intervals

A confidence interval targets the mean response at x. A prediction interval targets a single new observation. The prediction interval is wider because it adds individual-level noise on top of estimation uncertainty: SE_pred = sqrt(MSE * (1 + 1/n + (x - x-bar)^2 / SS_xx)).

Scheme

Notation reference

Symbol Formula Meaning
b1r * sy/sxSlope of regression line
b0ȳ - b1*x̄Intercept
ey - ŷResidual
1 - SSE/SSTCoefficient of determination
SE(b1)√(MSE/SS_xx)Standard error of slope
Neighbors

Cross-references

  • Linear Algebra Ch.1 — solving systems of equations, the algebraic backbone of least squares
  • 🤖 ML Ch.2 — linear regression as a machine learning problem: gradient descent vs. the normal equation
  • ∫ Calculus Ch.12 — least-squares minimization uses the same optimization techniques

Foundations (Wikipedia)