Caches All the Way Down
Part of the cognition series.
In software, we say “everything’s a wrapper.” An ORM wraps SQL, which wraps disk I/O, which wraps silicon. Each layer exposes the same four verbs (create, read, update, delete) and delegates to the layer below. Wrappers all the way down.
But CRUD is only the Remember interface. Store, retrieve, update, delete: that’s the API to the persistent store. When we say “wrapper” we’re seeing one role out of six and calling it the whole thing.
The rest of the pipeline
SQL has WHERE (Filter) and ORDER BY (Attend). The ORM above it has scopes (Filter) and eager loading (Attend). The API above that has authorization (Filter) and pagination (Attend). The frontend above that has conditional rendering (Filter) and sort/highlight (Attend).
Every layer re-implements the full pipeline over the data from the layer below. We say “just a wrapper” because Remember looks identical at every level. The other roles look different, so we don’t notice the repetition.
The computing stack
Digital computing is the clearest case because we built it from the floor up. The entire hardware-software stack is the Cache tower made visible, each level adding capacity until the full pipeline emerges.
| Level | Cache capacity | Perceive | Cache | Filter | Attend | Consolidate | Remember |
|---|---|---|---|---|---|---|---|
| Transistor | 1 bit | Voltage | — | Threshold gate | — | — | — |
| Logic gate | few bits | Input lines | Transistors | Boolean function | — | — | Output line |
| ALU | word | Operands, opcode | Logic gates, registers | Overflow, flags | Opcode selects operation | — | Result register |
| CPU | KB (L1) | Fetch instruction | ALUs, pipeline stages | Branch prediction | Scheduling, out-of-order | Branch predictor learns | Register file, L1 cache |
| OS | GB (RAM) | Interrupts, I/O | CPUs, memory hierarchy | Cache eviction | Scheduler dispatch | Defrag, compaction | Filesystem, swap |
| Database | TB (disk) | Query arrives | OS filesystem, B-trees | WHERE clause | ORDER BY, LIMIT | VACUUM, reindex | The table on disk |
| Backend | app memory | Request arrives | Database, ORM | Auth, validations | Pagination, sorting | Schema migrations | Database write |
| Frontend | viewport | User event | Backend responses, DOM | Conditional rendering | Sort, highlight, focus | User preferences | localStorage, DOM state |
| Container | image layers | Build context | Applications, runtime | .dockerignore, multi-stage discard | Layer ordering for cache hits | Image optimization | Image in registry |
| Kubernetes | cluster | Desired state, metrics | Containers, etcd | Admission controllers, resource quotas | Scheduler: affinity, constraints | Operator reconciliation loops | Cluster state |
| Autoscaler | fleet | CPU, memory, request rate | Kubernetes clusters | Cooldown periods, min/max bounds | Scaling policy: which pool, how much | Policy tuning from history | The running fleet |
The transistor row: one bit, pure threshold gating, no Attend, no Consolidate. The bool store. The autoscaler row: full pipeline across a fleet. Each row between them added capacity, and each time it crossed a threshold, another role filled in.
Nobody designed it this way. Engineers at each level solved their local problem (“hold more items, select among them, rank the survivors”) and arrived at the same pipeline independently. Add storage, and Filter and Attend follow.
The biological stack
The same tower in a person’s energy storage. Each level caches the level below.
| Level | Cache capacity | Perceive | Cache | Filter | Attend | Consolidate | Remember |
|---|---|---|---|---|---|---|---|
| ATP | 1 bond | Substrate arrives | — | Enzyme lock-and-key | — | — | — |
| Mitochondrion | many ATP molecules | Pyruvate, O₂ | ATP molecules | Membrane potential threshold | Uncoupling proteins | — | ATP output rate |
| Cell | glycogen granules | Glucose, insulin signal | Mitochondria | Metabolic gating (hexokinase) | Energy allocation across processes | Gene expression | Glycogen, protein |
| Liver | ~100g glycogen | Blood glucose, hormones | Cells, hepatocytes | Glucokinase threshold | Glycogenesis vs gluconeogenesis | Metabolic adaptation | Blood glucose level |
| Adipose / muscle | kg of fat, kg of protein | Insulin, excess energy | Liver, circulating glucose | Lipogenesis threshold | Which depots to mobilize | Set point adjustment | Fat mass |
| Mammal | total reserves | Hunger, satiety signals | Adipose, muscle | Ghrelin, leptin, appetite regulation | Meal choice, macronutrient balance | Metabolic adaptation, microbiome | Body composition |
ATP: one phosphate bond, pure enzyme gating. The bool store. The mammal row: full pipeline with hunger, choice, and metabolic adaptation. Same tower. Capacity grows, roles fill in. Evolution built each level because the one below couldn’t manage energy at the scale above.
Two substrates. Same staircase. Each level’s Cache is the level below, and each added enough capacity for another role to fill in. The shape repeats because the constraint forces it. The constraint also forces it to stop.
The tower has a floor
By induction on storage capacity.
Base case. A Cache with one bit of storage is a boolean. Pass or reject. Selection requires at least two items; one slot has nothing to compare. The only operation is threshold gating: a single if. No Attend (nothing to rank), no Consolidate (nothing to learn). The pipeline collapses to Filter alone.
Inductive step. A Cache at depth d with capacity S may contain a sub-pipeline whose sub-Cache at depth d+1 has capacity S’. Boundary 1 applies: the sub-Cache must fit inside the parent, so S’ < S by pigeonhole. Strictly decreasing.
Termination. Capacity is a natural number. A strictly decreasing sequence of naturals terminates. It reaches 1 bit. The tower has finite depth.
How deep is the universe’s cache? As deep as physics allows — down to whatever distinction is smallest. A qubit. A Planck bit. The bool store at the bottom of everything.
The Handshake proves the analogous result for Consolidate: induction on bit budget, with the data processing inequality as the decreasing measure, terminating at passthrough. Cache’s tower uses storage capacity instead, terminating at the bool store. Same structure, different measures. Consolidate is about compression. Cache is about capacity.
Bool stores in the wild
At the floor of every Cache tower, you should find a bool store doing threshold gating. And you do.
Ion channels: open or closed. One bit. Voltage threshold gates molecules through. No ranking, no learning. Pure Filter.
Transistors: on or off. Voltage above threshold passes the signal. Below, it blocks.
MHC binding: fits or doesn’t. Antigen presentation at the molecular level is a shape match — binding affinity is graded, but the groove either holds the peptide or releases it. Ranking among candidates happens one level up, where limited surface slots force selection among the fragments that passed.
Each is a Cache collapsed to a boolean. The prediction: below a bool store, no further self-similarity. You can’t have a sub-pipeline inside an if statement. If you found something smaller than a bool still doing selection, the argument would be falsified. But a bool is the minimum unit of distinction.
The AI stack
The same tower for AI. Read the dim cells.
| Level | Cache capacity | Perceive | Cache | Filter | Attend | Consolidate | Remember |
|---|---|---|---|---|---|---|---|
| Weight | 1 float | Gradient | — | Learning rate threshold | — | — | — |
| Neuron | ~hundreds of weights | Input vector | Weights | ReLU, activation | — | Backprop (offline, sealed) | Activation output |
| Attention head | ~millions params | Query, key, value | Neurons | Softmax masking | Attention scores | Training (sealed) | Weighted value |
| Block | attention heads | Residual input | Attention heads | Layer norm | Multi-head selection | Training (sealed) | Block output |
| Model | billions of params | Token sequence | Blocks, KV cache | No input gating | No diversity enforcement | Training (sealed) | Next token |
| Context window | ~128K tokens | User prompt, tool results | Model | Minimal redundancy inhibition | Recency bias, no DPP | Ephemeral — dies with the session | Response |
| Agent | context + tools | Task, codebase | Context windows | File selection heuristics | Context window selection | Skill creation, memory files | Completed task |
| Swarm | fleet of agents | Workload | Agents | Task routing | Load balancing | No shared learning | No collective memory |
The forward pass is well-optimized at the bottom: softmax is genuine Filter, attention scores genuine Attend. But Consolidate is dim at every level. Training is sealed, so the model learns nothing from its conversation and the context window dies with the session. The agent’s memory files are a bandage, not a schema.
Above the model level, almost everything is dim. The context window has minimal redundancy inhibition; every token gets in until the window is full. The agent selects files by heuristic, not competition. The swarm has no shared memory, no collective consolidation.
The computing stack filled in its dim cells over sixty years. The biology stack filled them in over four billion. The AI stack is a few years old and it shows. The dim cells are the roadmap.
The diagnostic
Active Consolidate within a Cache means there’s at least one more level below. Passthrough means you’ve hit the floor. The query optimizer learns from execution statistics, so it contains another pipeline. Ion channels don’t learn. The gate is the gate.
Every thin wrapper that’s genuinely CRUD passthrough either stays thin (it was at the floor, with nothing to learn) or grows filter and attend logic (it was above the floor, and usage pressure forced the missing roles in). Every ORM starts thin. The ones above the floor never stay that way.
It’s not wrappers all the way down. It’s pipelines — until you hit the bool.
Written via the double loop. More at pageleft.cc.