← back to cognitive science

Language and Communication

Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home

Language is a noisy channel between minds. The speaker encodes a meaning, the channel (speech, text) introduces noise, and the listener decodes. Surprisal measures how unexpected a word is: high-surprisal words slow reading. Pragmatic inference goes beyond literal meaning: listeners reason about what the speaker chose to say and what they could have said instead.

The communication channel

Shannon's model: a sender encodes a message, transmits it through a noisy channel, and a receiver decodes it. The fundamental theorem of information theory says that reliable communication is possible at any rate below the channel capacity. Language production and comprehension fit this framework: the speaker compresses meaning into words, the listener decompresses. The same perceive-to-attend pipeline appears in non-linguistic cognition, where a jksalience layer filters the stream before attention allocates resources.

Sender Channel + noise Receiver meaning words / speech inferred meaning
Scheme

Probabilistic language models

A language model assigns probabilities to sequences of words. An n-gram model conditions on the previous n-1 words. Better models compress language more efficiently. The cross-entropy between the model's predictions and the true distribution measures how well the model captures the statistical structure of language.

Scheme

Pragmatic inference

Literal meaning is just the starting point. A pragmatic listener reasons about the speaker's choice: if the speaker said "some students passed," they probably mean "not all," because a cooperative speaker would have said "all" if that were true. The Rational Speech Act (RSA) framework models this as nested Bayesian inference: the listener infers meaning by reasoning about a speaker who reasons about a literal listener.

Scheme

Notation reference

Term Meaning
Surprisal-log2(P(word)); information content in bits
n-gramP(word | previous n-1 words)
Cross-entropyExpected surprisal under true distribution
RSARational Speech Acts: nested Bayesian pragmatics
Channel capacityMaximum rate of reliable communication
Neighbors

Translation notes

The Lovelace textbook covers language acquisition, syntactic parsing, and semantic composition in addition to the information-theoretic and pragmatic perspectives presented here. This page focuses on the three ideas that connect most directly to the computational toolkit: surprisal as a linking hypothesis between models and reading data, n-gram models as the simplest language model, and RSA as Bayesian inference applied to communication.

Read the original: Lovelace, Chapter 7.