← back to Theory of Computation

Context-Free Grammars

Maheshwari & Smid · Ch. 4 · Introduction to Theory of Computation

A context-free grammar (CFG) generates strings through production rules. Each rule rewrites a single variable into a sequence of variables and terminals. CFGs describe more languages than regular expressions: they can handle nesting, matching brackets, and recursive structure.

Productions and derivations

A CFG is a 4-tuple (V, Σ, R, S):

A derivation is a sequence of rule applications starting from S that produces a string of terminals.

Parse trees

A parse tree shows the structure of a derivation. The root is the start variable. Internal nodes are variables. Leaves are terminals. Reading the leaves left-to-right gives the derived string.

E E + T T 1 T * F 2 3 Parse tree for 1+2*3 (multiplication binds tighter) E → E+T | T T → T*F | F F → digit
Scheme

Ambiguity

A grammar is ambiguous if some string has two different parse trees. The classic example: E → E+E | E*E | n gives two parse trees for "1+2*3". Ambiguity is a property of the grammar, not the language. Sometimes you can find an unambiguous grammar for the same language. Sometimes you cannot: some context-free languages are inherently ambiguous.

Chomsky normal form

Every CFG can be converted to wpChomsky normal form (CNF): every production is either A → BC (two variables) or A → a (one terminal). CNF makes proofs easier and enables the wpCYK parsing algorithm (O(n3) time).

Scheme
Neighbors
  • 🪄 SICP Ch.4 — parsing and interpreters use context-free grammars
  • 🪄 SICP Ch.1 — Scheme syntax is a context-free grammar; the reader is a CFG parser
  • 🤖 ML Ch.12 — LLMs model language distributions but lack the inductive bias of CFGs
  • 🔑 Logic Ch.6 — predicate logic has a context-free grammar for its well-formed formulas