← back to Theory of Computation

Regular Expressions

Maheshwari & Smid · Ch. 3 · Introduction to Theory of Computation

Regular expressions describe exactly the regular languages: union, concatenation, and Kleene star. They are equivalent to finite automata. The pumping lemma proves that some languages are beyond this level.

Syntax

A regular expression over alphabet Σ is built from three operations:

Base cases: each symbol a ∈ Σ is a regex, ε matches the empty string, and ∅ matches nothing.

· * c | a b Parse tree for (a|b)*c
Scheme
Python

Equivalence with finite automata

Two directions: every regex can be converted to an NFA (wpThompson's construction), and every DFA can be converted to a regex (state elimination). The regular languages are exactly those describable by regular expressions.

The pumping lemma

If a language is regular, then every sufficiently long string in it can be "pumped": a middle section can be repeated any number of times and the result stays in the language. Contrapositive: if you can find strings that resist pumping, the language is not regular.

Scheme
Python
Neighbors
  • 🪄 SICP Ch.9 — pattern matching in Scheme: regular expressions as recognition procedures
  • 📡 Information Theory Ch.1 — regular languages have finite entropy rate; their recognizers are minimal automata
  • 💾 Databases Ch.4 — SQL LIKE patterns and database query languages descend from regular expression theory