Agentic AI: Architectures, Taxonomies, and Evaluation

Arunkumar V, Gangadharan G.R., Rajkumar Buyya · 2026 · arxiv arXiv:2601.12560

LLMs as cognitive controllers: perception, memory, planning, action, tool use, and collaboration. The full agent stack built on top of frozen weights.

What this covers

Wrap an LLM in a loop with tools, memory, and feedback, and it stops being a text generator. It becomes a cognitive controller: perceive observations, update memory, plan next steps, select actions. The weights stay frozen. All adaptation happens through the context window, external memory, or verbal self-critique.

The control loop

Four named functions run each cycle:

Φ (Perceive): ground multimodal input. Text, screenshots, DOM, coordinates, audio, video, point clouds.
μ (Memory update): write observations to persistent state. Retrieval, structured storage, summarization, pruning.
Ψ (Plan): reason about what to do next. Chain, tree, or hierarchical decomposition.
π (Act): select and execute an action. API call, code execution, tool invocation, motor command.

The cycle repeats. Each action produces an observation that feeds the next perception. Reflection can interrupt the cycle to revise the plan.

Core components

Component	What it does	Parts bin cell
Perception	Multimodal grounding: text, vision, DOM, coordinates. Evolving from text-only to screenshots to video to 3D.	Perceive
Memory	Persistent state across episodes. Retrieval, structured storage, summarization, decay, pruning.	Cache + Remember
Planning	Reasoning topologies: linear chains (ReAct), branching trees (ToT), hierarchical decomposition, inference-time budgets.	Attend
Action + Tools	Execution: API calls → code-as-action → agent-computer interfaces → computer-use → embodied VLA.	Remember (output)
Reflection	Self-critique without weight updates. Store natural-language lessons, condition future attempts.	Consolidate (verbal)
Collaboration	Multi-agent coordination: chain, star, mesh topologies. Role-playing, debate, verification.	Attend (distributed)

Planning topologies

Topology	Mechanism	Example
Linear chain	Interleave reasoning and action. One step at a time.	ReAct
Branching tree	Treat thoughts as search nodes. Explore alternatives, backtrack.	Tree of Thoughts
Hierarchical	Decompose goal into subgoals. Each subgoal gets its own plan.	HuggingGPT, TaskWeaver
Internal search	Inference-time compute budgets. Search happens inside the model.	o1, o3

Action space evolution

Paradigm	What the agent can do	Constraint level
API-based	Call predefined functions with typed arguments	Most constrained
Code-as-action	Generate and execute arbitrary code	Less constrained
Agent-computer interface	Curated shell: file system, terminal, browser	Moderate
Computer-use	Mouse, keyboard, screenshots: the raw desktop	Minimal
Embodied VLA	Continuous motor primitives from vision-language-action models	Unconstrained

Memory architectures

System	Mechanism	Parts bin cell
Generative Agents	Natural language stream with reflection and summarization	Remember × sequence
MemoryBank	Hierarchical clusters with exponential decay	Cache × tree
ChatDB	Symbolic SQL tables for structured state	Cache × graph
MemGPT	Paged long-term memory with explicit controller-driven retrieval	Attend × sequence
MemInsight	Convert episodic traces into semantic insights via compression	Consolidate × sequence
MemAgent	Learn what to discard: policy-driven pruning	Filter × sequence

Multi-agent topologies

Topology	Pattern	Example
Chain	Sequential waterfall: each agent passes deliverables to the next	MetaGPT, ChatDev
Star	Hub-and-spoke: coordinator delegates to specialized workers	AutoGen, Swarm
Mesh	Decentralized: agents communicate dynamically, debate, simulate	CAMEL, Generative Agents

Reflection and feedback

The frozen-weights constraint means agents cannot learn by updating parameters. Instead, they learn verbally:

Framework	What it does
ReAct	Interleave reasoning traces with actions. Each observation feeds the next thought. Linear, no backtracking.
Reflexion	Store natural-language critiques of failures. Condition future attempts on these lessons. Verbal reinforcement.
Tree of Thoughts	Explore alternative reasoning paths. Evaluate and backtrack. Global search over thought space.
MAKER	Hierarchical verification: verifier agents challenge worker outputs. Near-zero error on million-step chains.

Evaluation: the CLASS framework

The paper argues that single success-rate metrics mask critical reliability issues. Their proposed replacement:

Dimension	What to measure
Cost	Token spend, API calls, compute budget per task
Latency	Time to first action, end-to-end task completion
Accuracy	Task success, failure severity distribution (benign vs. catastrophic)
Security	Prompt injection resistance, trust boundaries, audit logging
Stability	Run-to-run variance, infinite loop detection, error propagation

The memory problem

Every memory system in this survey reinvents what Soar built architecturally.

MemGPT's paged memory is Soar's working memory + semantic memory, split into tiers. The controller pages facts in and out of the context window the way Soar's retrieval pulls from SMEM into WM. Soar's retrieval uses activation (recency + frequency + spreading from context). MemGPT uses the LLM itself to decide what to page in. One is a mechanism. The other is a hope.

The consolidation pipeline that Soar lacks and the prescription proposed? MemInsight already builds it. Read episodes, detect regularities, write compressed knowledge. MemInsight uses an LLM summarizer. The prescription uses temporal graph coarsening. Same problem: episodes accumulate, you need to extract what generalizes.

Then there's forgetting. MemAgent learns a pruning policy over memory entries, which is what Minsky called censors. Minsky's version suppresses the thought preceding a bad action. Soar's truth maintenance auto-retracts structures whose justification no longer holds. Three approaches to the same problem at increasing levels of architectural integration.

Reasoning is search

ReAct is Soar's decision cycle, linearized. Observe, think, act, observe. Soar runs the same loop with parallel rule firing, staged preferences, and the impasse mechanism for recursive subgoaling. ReAct has none of that. The minimal viable agent loop: one thought, one action, no backtracking.

What about backtracking? Tree of Thoughts is MCTS for reasoning. Treat intermediate thoughts as search nodes. Evaluate. Backtrack. Explore alternatives. Soar's impasse mechanism does this natively: when the decision procedure can't select an operator, a substate opens and the same cycle runs recursively. ToT reimplements this in prompt space because the transformer has no native backtracking.

And learning from failure? Reflexion is verbal chunking. Soar's chunking backtraces a dependency chain and writes a production rule. Reflexion writes a natural-language critique and appends it to the prompt. Both compile deliberation into reusable knowledge. One produces executable rules. The other produces text that the LLM might or might not attend to. The gap between "compiles to code" and "appends to prompt" is the gap between architectural and verbal Consolidate.

Hierarchical verification (MAKER) takes A-brain / B-brain to scale. Worker agents act. Verifier agents challenge. Near-zero error on million-step chains. Minsky's reflection hierarchy, distributed across multiple agents. Also how RLHF works: one model generates, another judges. Minsky named the pattern forty years before anyone built it.

Neighbors

← Zhao 2023 · 4 of 5 by june.kim