← back to programming

Storage & Garbage Collection

Abelson & Sussman · 1996 · SICP §5.3

Memory is a vector of pairs. Garbage collection reclaims unreachable storage. Stop-and-copy GC compacts live data by copying it to new space, then flipping.

before X X X live free garbage stop-and-copy after stop-and-copy: live data moves, garbage stays behind

Memory as vectors

At the lowest level, memory is a flat array of numbered slots. Lisp pairs become two parallel vectors: the-cars and the-cdrs. Index n in both vectors together form one pair. A pointer is just an index into these vectors.

Scheme

Representing lists using vectors

A list is a chain of pairs where each cdr points to the next pair (or to the empty list). Walking the list means following cdr pointers through the vector. This is exactly how linked lists work in hardware — each “next” field is a memory address.

Scheme

Stop-and-copy garbage collector

When free memory runs out, the collector copies all reachable data from “from-space” to “to-space,” then flips. Dead data stays behind. The copy is contiguous — fragmentation vanishes. The cost is proportional to live data, not total memory.

Scheme

The broken-heart algorithm

When a cell is copied to new space, a broken heart (forwarding pointer) is left in old space. If the collector encounters this cell again via another reference, it follows the forwarding pointer instead of copying a second time. This handles shared structure and cycles.

Scheme

Free list vs compacting collection

An alternative to stop-and-copy is a free list: link all unused cells into a chain, allocate from the head. Simpler but causes fragmentation. Stop-and-copy trades the cost of moving live data for the benefit of contiguous allocation — every allocation is a pointer bump.

Scheme

Notation reference

Scheme / SICP Python Meaning
(cons a b)(a, b) or [a, b]Allocate a pair
(car p) / (cdr p)p[0] / p[1]First / second element of pair
(vector-ref v i)v[i]Read from vector/array
(vector-set! v i x)v[i] = xWrite to vector/array
broken-heartforwarding dictMarker: cell already copied
from-space / to-spacefrom_* / to_*Two memory half-spaces for GC

Translation notes

Python hides all of this behind the runtime. sys.getrefcount() exposes reference counts; gc.collect() triggers the cycle collector. CPython uses reference counting (immediate collection) plus a generational collector for cycles — not stop-and-copy. The SICP model is closer to what the JVM or Go runtime does with copying/compacting collectors. The vector representation of pairs maps directly to how struct-of-arrays layouts work in systems programming.

Neighbors

Other reading pages

  • SICP Ch.19 — Register Machine Simulator (the substrate this chapter manages memory for)

Foundations (Wikipedia)

Ready for the real thing? Read SICP §5.3.