Diagnosis: Soar
Part of the cognition series. See also: Prescription: Soar.
Soar is among the most ambitious artifacts in computer science. Where most AI research optimizes a single capability, John Laird and his collaborators spent forty years building the whole mind, taking Allen Newell’s challenge literally (Laird, 2022, §intro). The result is an architecture of extraordinary internal coherence. Every module earns its place, every mechanism connects through a single central hub, and the decision cycle stages parallel rule firing into sequential action.
Soar works, remarkably well. The question is where the architecture’s own growth edges are.
The diagnosis is based on Laird’s 2022 introduction, the Gentle Introduction (Lehman, Laird, & Rosenbloom, 2006), and correspondence with Laird.
Observations
Soar is not one pipeline. It is a set of interacting task-independent modules (§1, p.2). Figure 1 of Laird (2022) shows the structure: four memories (Procedural, Semantic, Episodic, Symbolic Working Memory), four learning modules (Chunking, RL, Semantic Learning, Episodic Learning), three processing components (Preference Memory, Decision Procedure, Operator selection), the Spatial-Visual System, and Embodiment (Perception, Motor).
Each module has its own internal pipeline: input, working memory, elaboration, selection, output, and learning. The decision cycle is the top-level pipeline that orchestrates them. Diagnosing Soar means diagnosing each module individually.
What Soar gets right
The architecture achieves things no competitor matches.
The impasse mechanism is a work of engineering. When knowledge is insufficient to select or apply an operator, Soar creates a substate and reasons about the gap (§3, p.7). The same decision cycle runs recursively in the substate, with full access to all reasoning and memory capabilities. This single mechanism unifies planning, hierarchical task decomposition, metacognition, and deliberate operator evaluation without separate meta-processing modules (§3.3, p.9). Most architectures bolt these on. Soar derives them.
The decision cycle stages elaboration before selection. Elaboration rules fire in causally dependent waves: situation elaboration, then operator proposal, then operator evaluation (§2.2, p.5). The decision procedure processes reject preferences first; if sufficient, it stops without ranking (§2.3, p.6). Rejection before ranking, achieved through causal dependencies rather than separate phases.
The architecture arrived at this answer through forty years of building agents that had to work. Soar started in 1983 as a problem-solving architecture. The early agents exposed what was missing: no way to learn from deliberation (chunking was added), no way to handle uncertainty in selection (RL was added in 2005), no way to remember facts or experiences (semantic and episodic memory were added in 2006–2008), no way to reason about space (SVS was added). Each addition came from running into a wall while building a real agent, then extending the architecture to get past it. The staging of rejection before ranking wasn’t designed from theory. It was discovered by engineers who kept building agents until the decision procedure had to work that way.
Chunking is the cleanest learning mechanism in any cognitive architecture. It backtraces through the dependency chain, identifies which superstate conditions were necessary, and writes a production rule that fires directly next time (§4, p.9–10). Deliberation compiles into reaction. EBBS ensures the learned rule is correct relative to the substate reasoning and as general as possible without being over-general (§4, p.10). Among cognitive architectures, this is the most principled compiler.
The combinations are unique. Soar is the only architecture where (§9.3, p.17):
- RL learns retrievals from episodic memory (Gorski & Laird, 2011)
- Mental imagery simulates actions to detect collisions that inform RL (Wintermute, 2010)
- Chunking compiles planning into evaluation rules, then RL tunes the initial values (Laird, 2011)
- Episodic memory, metareasoning, and chunking combine for one-shot learning of operator evaluation knowledge (Mohan, 2015)
These emerge from the architecture. The decision cycle, working memory, and the impasse mechanism make them composable.
Real-time with millions of knowledge elements. Soar achieves a decision cycle of ~50ms even with millions of rules, facts, and episodes (§10, item 3, p.18). The RETE network’s incremental matching and episodic memory’s delta-based storage keep costs proportional to change, not to total knowledge.
Demonstrated in real systems. Over the years, Soar agents have been embodied in real-world robots, computer games, and large-scale distributed simulation environments (§intro, p.1). These include:
- Rosie: learns new tasks from real-time natural language instruction, acquires task structures interactively, the most capable demonstration of Soar’s learning integration (§10, item 2, p.18; Lindes, 2022)
- Real-world robots: over 20 Soar-controlled robots with real-time decision-making, planning, and spatial reasoning via SVS (§10, item 4, p.18)
- Large-scale military simulations: agents incorporating real-time decision-making, planning, natural language understanding, metacognition, theory of mind, and mental imagery (§intro, p.1; Stearns, 2021)
- Human behavior modeling: detailed cognitive models that predict human performance (Schatz et al., 2022)
- Some agents have run uninterrupted for 30 days (§10, item 9, p.19)
Laird rates Soar on 16 capabilities derived from Newell (1990): 8 as “yes,” 5 as “partial,” and 2 as “no” (§10, p.20). These are demonstrated in deployed agents across domains.
The Decision Cycle (top-level pipeline)
All five forward phases functional. Elaboration rules compute abstractions and propose operators (§2.2.1–2.2.2). Evaluation rules create better/worse/best/worst/numeric preferences (§2.2.3); the fixed decision procedure processes rejections first, ranks survivors only if needed (§2.3). Soft-max available for numeric preferences (§5, fn.5).
The elaboration phase fires rules in parallel waves — “a common progression that starts with a wave of elaboration rule firings, followed by a wave of operator proposal, and finally a wave of operator evaluation” (§2.2, p.5). These waves are causally dependent: evaluation can’t fire until proposals exist. Laird confirmed in correspondence that “the results of this overall phase would be exactly the same if the roles were split and run sequentially.” Rejection before ranking, implemented through causal dependencies rather than explicit phases.
Symbolic Working Memory (memory)
Working memory “maintains an agent’s situational awareness, including perceptual input, intermediate reasoning results, active goals, hypothetical states, and buffers” (§1, p.2). Justification-based truth maintenance (§2.2, p.5) provides automatic retraction: I-supported structures retract when their creating rule no longer matches. Working memory doesn’t rank and doesn’t learn. That’s by design.
Procedural Memory (memory)
The only store with automatic learning. The RETE processes only changes to working memory — “rules fire only once for a specific match to data in working memory (this is called an instantiation)” (§2.2, p.5). No selection among rules: all matched instantiations fire in parallel. Rules don’t compete. Operators do. Chunking (§4) and RL (§5) both write to this store.
Semantic Memory (memory)
Five of six phases functional. Retrieval uses “a combination of base-level activation and spreading activation to determine the best match, as used originally in ACT-R” (§6, p.12). Base-level activation biases by recency and frequency; spreading activation biases toward concepts linked to currently active working memory structures. But: “Soar does not have an automatic learning mechanism for semantic memory, but an agent can deliberately store information at any time” (§6, p.13). The store grows only by hand or preloading (WordNet, DBpedia).
Episodic Memory (memory)
Five of six phases functional. “A new episode is automatically stored at the end of each decision” (§7, p.13). “Soar minimizes the memory overhead of episodic memory by storing only the changes between episodes” (§7, p.13). But “memory does grow over time, and the cost to retrieve old episodes slowly increases as the number of episodes grows” (§7, p.13). Episodic learning “does not have generalization mechanisms” (§7, p.13).
Spatial-Visual System (memory)
Five forward phases functional. “An agent uses operators to issue commands to SVS that create filters” that “automatically extract symbolic properties” (§8, p.14). Top-down control of what gets symbolized. “SVS supports hypothetical reasoning…through the ability to project non-symbolic structures into SVS” (§8, p.14). Laird confirmed in correspondence: “There is filtering at this phase as well.” The image memory system is “still experimental” (§9.2, Figure 6).
Chunking (learning)
Five of six phases functional. Chunking “compiles the processing in a substate into rules that create the substate results” (§4, p.9). It “back-traces through the rule that created them” (§4, p.10) to find superstate conditions. EBBS — “explanation-based behavior summarization” (§4, p.10) — ensures chunks are correct and as general as possible. But “chunking requires that substate decisions be deterministic…Therefore, chunking is not used when decisions are made using numeric preferences” (§4, p.10).
Laird has the right plan: “We have plans to modify chunking so that such chunks are added to procedural memory when there is sufficient accumulated experience to ensure that they have a high probability of being correct” (§4, p.10). Gate chunking on RL convergence. The implementation doesn’t exist yet.
Reinforcement Learning (learning)
Five of six phases functional. “RL modifies selection knowledge so that an agent’s operator selections maximize future reward” (§5, p.11). “RL in Soar applies to every active substate,” a natural fit for hierarchical RL (§5, p.12). Global learning rate and discount rate are “fixed at agent initialization” (§5, fn.5). Delta-bar-delta mode adapts per-production learning rates automatically, but the global parameters and exploration strategy are static.
Semantic Learning (learning)
Not yet autonomous. Three of six phases are agent-directed, not architectural. “An agent can deliberately store information at any time” (§6, p.13) but there’s no relevance gating, no prioritization, no self-update. It’s a raw
store()call. Laird himself rates semantic learning as still “missing” among “types of architectural learning” (§10, item 7, p.18).
Episodic Learning (learning)
Automatic but undiscriminating. “A new episode is automatically stored at the end of each decision” (§7, p.13). “An agent can further limit the costs of retrievals by explicitly controlling which aspects of the state are stored, usually ignoring frequently changing low-level sensory data” (§7, p.13). But no mechanism discriminates which episodes are worth keeping.
The pattern
Every memory module has a functional forward pass. Every learning module is missing its own learning mechanism.
| Stack | Type | Forward pass | Learning |
|---|---|---|---|
| Decision Cycle | Top-level | Functional | Partial |
| Symbolic Working Memory | Memory | Functional | Nil (expected) |
| Procedural Memory / RETE | Memory | Functional | Functional |
| Semantic Memory | Memory | Functional | Missing |
| Episodic Memory | Memory | Functional | Missing |
| SVS / Perceptual LTM | Memory | Functional | Partial |
| Chunking | Learning | Functional | Missing |
| Reinforcement Learning | Learning | Functional | Partial (delta-bar-delta) |
| Semantic Learning | Learning | Functional | Missing |
| Episodic Learning | Learning | Functional | Missing |
Procedural memory is the only store with automatic learning. Chunking and RL both write to it. Semantic memory, episodic memory, and perceptual LTM have no learning mechanism that writes back to them.
Laird (§10, p.20): “What I feel is most missing from Soar is its ability to ‘bootstrap’ itself up from the architecture and a set of innate knowledge into being a fully capable agent across a breadth of tasks.”
Triage
- Semantic Memory has no automatic learning. The most important store grows only by hand. §6, p.13
- Episodic Memory has no generalization. The write-ahead log never compacts. §7, p.13
- Chunking cannot compose with RL. The determinism requirement walls off stochastic selection from the compiler. §4, p.10
- Semantic Learning is three-sixths missing. No filter, no attend, no consolidate. §6; §10, p.18
- Episodic Learning records without discrimination. §7, p.13
- RL has delta-bar-delta for per-rule learning rates but global parameters and exploration strategy are static. §5, fn.5
- Forward pass at every level: functional. No action needed.
SOAP Notes
1. Semantic Memory consolidation
Subjective. Semantic memory “encodes facts that an agent ‘knows’ about itself and the world” and “serves as a knowledge base that encodes general context-independent world knowledge, but also specific knowledge about an agent’s environment, capabilities, and long-term goals” (§6, p.12). Retrieval uses activation from ACT-R (§6, p.12).
Objective. “Soar does not have an automatic learning mechanism for semantic memory, but an agent can deliberately store information at any time” (§6, p.13). Can be “initialized with knowledge from existing curated knowledge bases (such as WordNet or DBpedia) and/or built up incrementally by the agent during its operations” (§6, p.13). Base-level activation metadata updates automatically, but this biases retrieval, not knowledge creation.
Assessment. Semantic memory is a database with no ETL pipeline. The retrieval engine is sophisticated (activation-based, context-sensitive). The ingestion path is a raw INSERT. The missing piece is the batch job that reads from episodic memory (the event log) and writes regularities to semantic memory (the knowledge base). Laird acknowledges semantic learning is still “missing” among architectural learning types (§10, item 7, p.18).
Plan. An episodic-to-semantic consolidation module:
- Trigger: idle time, goal completion, or episode accumulation threshold.
- Read: query EPMEM for recent episodes within a context window.
- Detect: find co-occurring structures, recurring operator sequences, stable features across episodes.
- Write: create new SMEM graph structures encoding the detected regularities.
- Verify: on next retrieval, check the generalization against new episodes. Decay activation on generalizations that don’t match.
2. Chunking–RL composition
Subjective. Chunking “is a learning mechanism that converts deliberate, sequential reasoning into parallel rule firings” (§4, p.9). RL “modifies selection knowledge so that an agent’s operator selections maximize future reward” (§5, p.11). Together they should cover the full learning space.
Objective. “Chunking requires that substate decisions be deterministic so that they will always create the same result. Therefore, chunking is not used when decisions are made using numeric preferences” (§4, p.10). The two mechanisms share working memory but their learning products don’t feed each other.
Assessment. Laird has the right plan: “We have plans to modify chunking so that such chunks are added to procedural memory when there is sufficient accumulated experience to ensure that they have a high probability of being correct” (§4, p.10). Gate chunking on RL convergence. This is one learning module reading from another’s output.
The deeper issue is that Chunking has no learning mechanism for itself. EBBS (§4, p.10) improved chunk quality but doesn’t prune the chunk store. Chunks accumulate.
Plan. Two changes:
- RL-gated chunking: as Laird describes, allow chunking when RL preferences converge below a variance threshold.
- Chunk review: periodically evaluate chunk utility. Chunks that never fire can be retracted. Learning that reviews its own learned output.
3. Episodic discrimination
Subjective. “A new episode is automatically stored at the end of each decision” (§7, p.13). Agents can “limit the costs of retrievals by explicitly controlling which aspects of the state are stored” (§7, p.13).
Objective. At ~50ms per decision cycle (§10, item 3, p.18), that’s 20 episodes per second, 72,000 per hour. “The cost to retrieve old episodes slowly increases as the number of episodes grows, whereas the time to retrieve recent episodes remains constant” (§7, p.13). Episodic learning “does not have generalization mechanisms” (§7, p.13).
Assessment. The missing phase is selection. An importance signal — computed from reward, impasse resolution, or state novelty — would let the agent record high-value episodes at full fidelity and routine episodes at reduced fidelity or not at all.
Plan. Add an importance gate to Episodic Learning:
- Novelty detector: compare current state to recent episodes. High delta = novel = worth recording.
- Reward proximity: episodes near reward events get full fidelity.
- Impasse resolution: episodes where an impasse was resolved contain the reasoning that worked.
- Routine suppression: states that match recent episodes within a similarity threshold are skipped or stored at reduced fidelity.
Forty years of agents running into walls, and the architecture extending itself to get past them. Every wall so far was in the forward pass. The next wall is in the learning modules. Soar has always grown by hitting that wall and building through it. This is the next one.
Mapping to the Natural Framework
This diagnosis uses Soar’s own terminology. For readers familiar with the Natural Framework, the mapping is nearly one-to-one:
| Soar | Framework | Notes |
|---|---|---|
| Input | Perceive | Input-link, SVS perception |
| Working Memory | Cache | WME graph, central hub |
| Elaboration | Filter | Situation elaboration + operator proposal |
| Selection | Attend | Operator evaluation + decision procedure |
| Output | Remember | Output-link, LTM store commands |
| Learning | Consolidate | Chunking, RL, semantic/episodic learning |
The framework’s formal treatment, including the proof that these six roles are obligatory, is at The Natural Framework.
Diagnosis based on Laird (2022), “Introduction to the Soar Cognitive Architecture.” All section references cite this paper unless noted as correspondence. Written via the double loop.