← back to statistics

Summarizing Data

OpenIntro Statistics · Ch. 2 · openintro.org/book/os

Raw data is noise until you summarize it. Center (mean, median), spread (standard deviation, IQR), and shape (skew, modality) compress a dataset into a few numbers. Visualizations like histograms and box plots make the structure visible.

Mean, median, mode

The mean is the arithmetic average. The median is the middle value when sorted. The mode is the most frequent value. The mean is sensitive to outliers; the median is robust.

Min Q1 Median Q3 Max IQR
Scheme

Standard deviation and IQR

Standard deviation measures average distance from the mean. IQR (interquartile range) is Q3 minus Q1, covering the middle 50% of data. IQR is robust to outliers; standard deviation is not.

Scheme

Contingency tables

For two categorical variables, a contingency table (cross-tabulation) counts how often each combination occurs. Row and column proportions reveal the relationship between the variables.

Scheme
Neighbors

Foundations (Wikipedia)

Translation notes

OpenIntro covers histograms, dot plots, and intensity maps with the county dataset. We focus on the core numerical summaries. The original also introduces the concept of robust statistics (median vs. mean) with income distribution examples. Variance here uses Bessel's correction (n-1 denominator) for sample variance, matching the textbook convention.

Want the full treatment? Read OpenIntro Statistics, Ch. 2.