← back to index

Agentic AI: Architectures, Taxonomies, and Evaluation

Arunkumar V, Gangadharan G.R., Rajkumar Buyya · 2026 · arxiv arXiv:2601.12560

LLMs as cognitive controllers: perception, memory, planning, action, tool use, and collaboration. The full agent stack built on top of frozen weights.

Environment Perceive Φ Memory μ Plan Ψ Act π Tools Collaborate other agents observation → next cycle reflect the agentic control loop: Φ → μ → Ψ → π, with tool calls and feedback

What this covers

Wrap an LLM in a loop with tools, memory, and feedback, and it stops being a text generator. It becomes a cognitive controller: perceive observations, update memory, plan next steps, select actions. The weights stay frozen. All adaptation happens through the context window, external memory, or verbal self-critique.

The control loop

Four named functions run each cycle:

  1. Φ (Perceive): ground multimodal input. Text, screenshots, DOM, coordinates, audio, video, point clouds.
  2. μ (Memory update): write observations to persistent state. Retrieval, structured storage, summarization, pruning.
  3. Ψ (Plan): reason about what to do next. Chain, tree, or hierarchical decomposition.
  4. π (Act): select and execute an action. API call, code execution, tool invocation, motor command.

The cycle repeats. Each action produces an observation that feeds the next perception. Reflection can interrupt the cycle to revise the plan.

Core components

Component What it does Parts bin cell
PerceptionMultimodal grounding: text, vision, DOM, coordinates. Evolving from text-only to screenshots to video to 3D.Perceive
MemoryPersistent state across episodes. Retrieval, structured storage, summarization, decay, pruning.Cache + Remember
PlanningReasoning topologies: linear chains (ReAct), branching trees (ToT), hierarchical decomposition, inference-time budgets.Attend
Action + ToolsExecution: API calls → code-as-action → agent-computer interfaces → computer-use → embodied VLA.Remember (output)
ReflectionSelf-critique without weight updates. Store natural-language lessons, condition future attempts.Consolidate (verbal)
CollaborationMulti-agent coordination: chain, star, mesh topologies. Role-playing, debate, verification.Attend (distributed)

Planning topologies

linear ReAct branching Tree of Thoughts hierarchical goal sub₁ sub₂ sub₃ decomposition internal search inside model o1, o3 increasing search depth
Topology Mechanism Example
Linear chainInterleave reasoning and action. One step at a time.ReAct
Branching treeTreat thoughts as search nodes. Explore alternatives, backtrack.Tree of Thoughts
HierarchicalDecompose goal into subgoals. Each subgoal gets its own plan.HuggingGPT, TaskWeaver
Internal searchInference-time compute budgets. Search happens inside the model.o1, o3
Action space evolution
Paradigm What the agent can do Constraint level
API-basedCall predefined functions with typed argumentsMost constrained
Code-as-actionGenerate and execute arbitrary codeLess constrained
Agent-computer interfaceCurated shell: file system, terminal, browserModerate
Computer-useMouse, keyboard, screenshots: the raw desktopMinimal
Embodied VLAContinuous motor primitives from vision-language-action modelsUnconstrained
Memory architectures
System Mechanism Parts bin cell
Generative AgentsNatural language stream with reflection and summarizationRemember × sequence
MemoryBankHierarchical clusters with exponential decayCache × tree
ChatDBSymbolic SQL tables for structured stateCache × graph
MemGPTPaged long-term memory with explicit controller-driven retrievalAttend × sequence
MemInsightConvert episodic traces into semantic insights via compressionConsolidate × sequence
MemAgentLearn what to discard: policy-driven pruningFilter × sequence
Multi-agent topologies
Topology Pattern Example
ChainSequential waterfall: each agent passes deliverables to the nextMetaGPT, ChatDev
StarHub-and-spoke: coordinator delegates to specialized workersAutoGen, Swarm
MeshDecentralized: agents communicate dynamically, debate, simulateCAMEL, Generative Agents

Reflection and feedback

The frozen-weights constraint means agents cannot learn by updating parameters. Instead, they learn verbally:

Framework What it does
ReActInterleave reasoning traces with actions. Each observation feeds the next thought. Linear, no backtracking.
ReflexionStore natural-language critiques of failures. Condition future attempts on these lessons. Verbal reinforcement.
Tree of ThoughtsExplore alternative reasoning paths. Evaluate and backtrack. Global search over thought space.
MAKERHierarchical verification: verifier agents challenge worker outputs. Near-zero error on million-step chains.
Evaluation: the CLASS framework

The paper argues that single success-rate metrics mask critical reliability issues. Their proposed replacement:

Dimension What to measure
CostToken spend, API calls, compute budget per task
LatencyTime to first action, end-to-end task completion
AccuracyTask success, failure severity distribution (benign vs. catastrophic)
SecurityPrompt injection resistance, trust boundaries, audit logging
StabilityRun-to-run variance, infinite loop detection, error propagation

The memory problem

Every memory system in this survey reinvents what Soar built architecturally.

MemGPT's paged memory is Soar's working memory + semantic memory, split into tiers. The controller pages facts in and out of the context window the way Soar's retrieval pulls from SMEM into WM. Soar's retrieval uses activation (recency + frequency + spreading from context). MemGPT uses the LLM itself to decide what to page in. One is a mechanism. The other is a hope.

The consolidation pipeline that Soar lacks and the jkprescription proposed? MemInsight already builds it. Read episodes, detect regularities, write compressed knowledge. MemInsight uses an LLM summarizer. The prescription uses temporal graph coarsening. Same problem: episodes accumulate, you need to extract what generalizes.

Then there's forgetting. MemAgent learns a pruning policy over memory entries, which is what Minsky called censors. Minsky's version suppresses the thought preceding a bad action. Soar's truth maintenance auto-retracts structures whose justification no longer holds. Three approaches to the same problem at increasing levels of architectural integration.

Reasoning is search

ReAct is Soar's decision cycle, linearized. Observe, think, act, observe. Soar runs the same loop with parallel rule firing, staged preferences, and the impasse mechanism for recursive subgoaling. ReAct has none of that. The minimal viable agent loop: one thought, one action, no backtracking.

What about backtracking? Tree of Thoughts is MCTS for reasoning. Treat intermediate thoughts as search nodes. Evaluate. Backtrack. Explore alternatives. Soar's impasse mechanism does this natively: when the decision procedure can't select an operator, a substate opens and the same cycle runs recursively. ToT reimplements this in prompt space because the transformer has no native backtracking.

And learning from failure? Reflexion is verbal chunking. Soar's chunking backtraces a dependency chain and writes a production rule. Reflexion writes a natural-language critique and appends it to the prompt. Both compile deliberation into reusable knowledge. One produces executable rules. The other produces text that the LLM might or might not attend to. The gap between "compiles to code" and "appends to prompt" is the gap between architectural and verbal Consolidate.

Hierarchical verification (MAKER) takes A-brain / B-brain to scale. Worker agents act. Verifier agents challenge. Near-zero error on million-step chains. Minsky's reflection hierarchy, distributed across multiple agents. Also how RLHF works: one model generates, another judges. Minsky named the pattern forty years before anyone built it.

Neighbors