Language and Communication

Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home

Language is a noisy channel between minds. The speaker encodes a meaning, the channel (speech, text) introduces noise, and the listener decodes. Surprisal measures how unexpected a word is: high-surprisal words slow reading. Pragmatic inference goes beyond literal meaning: listeners reason about what the speaker chose to say and what they could have said instead.

The communication channel

Shannon's model: a sender encodes a message, transmits it through a noisy channel, and a receiver decodes it. The fundamental theorem of information theory says that reliable communication is possible at any rate below the channel capacity. Language production and comprehension fit this framework: the speaker compresses meaning into words, the listener decompresses. The same perceive-to-attend pipeline appears in non-linguistic cognition, where a salience layer filters the stream before attention allocates resources.

Scheme

; Information content and surprisal
; Surprisal = -log2(P(word))
; High surprisal = unexpected word = more information

(define (log2 x) (/ (log x) (log 2)))
(define (surprisal p) (- (log2 p)))

; Word probabilities in context "The cat sat on the ___"
(define p-mat 0.15)
(define p-floor 0.10)
(define p-elephant 0.001)
(define p-the 0.30)

(display "Surprisal of 'mat':      ")
(display (surprisal p-mat)) (display " bits") (newline)
(display "Surprisal of 'floor':    ")
(display (surprisal p-floor)) (display " bits") (newline)
(display "Surprisal of 'elephant': ")
(display (surprisal p-elephant)) (display " bits") (newline)
(display "Surprisal of 'the':      ")
(display (surprisal p-the)) (display " bits") (newline)
(newline)
(display "Higher surprisal = slower reading time (empirically)")

Probabilistic language models

A language model assigns probabilities to sequences of words. An n-gram model conditions on the previous n-1 words. Better models compress language more efficiently. The cross-entropy between the model's predictions and the true distribution measures how well the model captures the statistical structure of language.

Scheme

; Bigram language model: P(word | previous word)
; Estimate from a tiny corpus.

; Corpus: "the cat sat" "the cat ran" "the dog sat"
(define bigrams
  '(("the" "cat" 2) ("the" "dog" 1)
    ("cat" "sat" 1) ("cat" "ran" 1) ("dog" "sat" 1)))

(define (count-prefix prefix)
  (apply + (map (lambda (b) (if (equal? (car b) prefix) (caddr b) 0))
               bigrams)))

(define (bigram-prob w1 w2)
  (let ((match (filter (lambda (b)
                  (and (equal? (car b) w1) (equal? (cadr b) w2)))
                bigrams)))
    (if (null? match) 0.0
        (/ (caddr (car match)) (count-prefix w1)))))

(display "P(cat | the) = ")
(display (exact->inexact (bigram-prob "the" "cat"))) (newline)
(display "P(dog | the) = ")
(display (exact->inexact (bigram-prob "the" "dog"))) (newline)
(display "P(sat | cat) = ")
(display (exact->inexact (bigram-prob "cat" "sat"))) (newline)
(display "P(ran | cat) = ")
(display (exact->inexact (bigram-prob "cat" "ran")))

Pragmatic inference

Literal meaning is just the starting point. A pragmatic listener reasons about the speaker's choice: if the speaker said "some students passed," they probably mean "not all," because a cooperative speaker would have said "all" if that were true. The Rational Speech Act (RSA) framework models this as nested Bayesian inference: the listener infers meaning by reasoning about a speaker who reasons about a literal listener.

Scheme

; Scalar implicature: "some" implies "not all"
; RSA model (simplified)
;
; Literal listener: P(state | utterance) proportional to truth value
; Speaker: P(utterance | state) proportional to informativity
; Pragmatic listener: P(state | utterance) via Bayes over speaker

; States: "all passed" or "some passed"
; Utterances: "all" or "some"

; Literal semantics: is the utterance true in the state?
(define (literal utt state)
  (cond ((and (equal? utt "all") (equal? state "all"))   1)
        ((and (equal? utt "all") (equal? state "some"))  0)
        ((and (equal? utt "some") (equal? state "all"))  1) ; "some" is true when all
        ((and (equal? utt "some") (equal? state "some")) 1)
        (else 0)))

; Pragmatic speaker: prefers informative utterances
(define (speaker state)
  (let ((p-all  (literal "all" state))
        (p-some (literal "some" state)))
    (let ((total (+ p-all p-some)))
      (list (/ p-all total) (/ p-some total)))))

; When state = "all", speaker says "all" or "some" equally
(display "Speaker given 'all':  P(say all)=")
(display (car (speaker "all")))
(display " P(say some)=")
(display (cadr (speaker "all"))) (newline)

; When state = "some", speaker must say "some"
(display "Speaker given 'some': P(say all)=")
(display (car (speaker "some")))
(display " P(say some)=")
(display (cadr (speaker "some"))) (newline)

; Pragmatic listener hearing "some":
; P(all|"some") is lower because speaker would have said "all"
(display "Result: hearing 'some' implies 'not all'")

; Scalar implicature: "some" implies "not all"
; RSA model (simplified)
;
; Literal listener: P(state | utterance) proportional to truth value
; Speaker: P(utterance | state) proportional to informativity
; Pragmatic listener: P(state | utterance) via Bayes over speaker

; States: "all passed" or "some passed"
; Utterances: "all" or "some"

; Literal semantics: is the utterance true in the state?
(define (literal utt state)
  (cond ((and (equal? utt "all") (equal? state "all"))   1)
        ((and (equal? utt "all") (equal? state "some"))  0)
        ((and (equal? utt "some") (equal? state "all"))  1) ; "some" is true when all
        ((and (equal? utt "some") (equal? state "some")) 1)
        (else 0)))

; Pragmatic speaker: prefers informative utterances
(define (speaker state)
  (let ((p-all  (literal "all" state))
        (p-some (literal "some" state)))
    (let ((total (+ p-all p-some)))
      (list (/ p-all total) (/ p-some total)))))

; When state = "all", speaker says "all" or "some" equally
(display "Speaker given 'all':  P(say all)=")
(display (car (speaker "all")))
(display " P(say some)=")
(display (cadr (speaker "all"))) (newline)

; When state = "some", speaker must say "some"
(display "Speaker given 'some': P(say all)=")
(display (car (speaker "some")))
(display " P(say some)=")
(display (cadr (speaker "some"))) (newline)

; Pragmatic listener hearing "some":
; P(all|"some") is lower because speaker would have said "all"
(display "Result: hearing 'some' implies 'not all'")

Notation reference

Term	Meaning
Surprisal	-log2(P(word)); information content in bits
n-gram	P(word \| previous n-1 words)
Cross-entropy	Expected surprisal under true distribution
RSA	Rational Speech Acts: nested Bayesian pragmatics
Channel capacity	Maximum rate of reliable communication

Neighbors

Shannon Ch.7 — channel capacity and the noisy channel coding theorem
Lovelace Ch.2 — the Bayesian inference that RSA builds on
📡 Shannon 01 Surprise — information content in language: surprisal as the fundamental unit
Rational Speech Acts

Translation notes

The Lovelace textbook covers language acquisition, syntactic parsing, and semantic composition in addition to the information-theoretic and pragmatic perspectives presented here. This page focuses on the three ideas that connect most directly to the computational toolkit: surprisal as a linking hypothesis between models and reading data, n-gram models as the simplest language model, and RSA as Bayesian inference applied to communication.

Read the original: Lovelace, Chapter 7.

← Decision Making by june.kim Cognitive Architecture · 8 of 8 →