LLM Cognitive Architecture
Overview
The emergence of Large Language Models (LLMs) has given rise to an entirely new cognitive architecture paradigm: using the LLM as the core "cognitive engine," combined with structured memory, tools, and control flow to build agents. This article provides an in-depth analysis of the CoALA (Cognitive Architectures for Language Agents) framework, the RAISE architecture, and how LLMs map to classical cognitive functions.
1. Why Do We Need LLM Cognitive Architectures?
An LLM by itself is merely a "text-to-text" function:
To turn an LLM into an agent, we need to answer:
- Memory: How to extend the limited context window?
- Reasoning: How to achieve multi-step, backtrackable reasoning?
- Action: How to interact with the external world?
- Learning: How to improve from experience?
- Control: How to coordinate these components?
Cognitive architectures provide systematic answers to these questions.
2. The CoALA Framework
CoALA (Cognitive Architectures for Language Agents), proposed by Sumers et al. (2024), is currently the most complete cognitive architecture framework for LLM agents.
2.1 Architecture Overview
graph TD
subgraph CoALA Framework
subgraph Memory System
WM[Working Memory]
LTM[Long-Term Memory]
LTM --> EP[Episodic Memory]
LTM --> SEM[Semantic Memory]
LTM --> PROC[Procedural Memory]
end
subgraph Decision Process
LLM_CORE[LLM Core<br/>Reasoning Engine]
DL[Deliberation Loop]
end
subgraph Action Space
INT[Internal Actions<br/>Reasoning / Retrieval / Learning]
EXT[External Actions<br/>Tool Calling / Environment Interaction]
end
ENV[External Environment] -->|Perception| WM
WM --> LLM_CORE
LTM --> WM
LLM_CORE --> DL
DL --> INT
DL --> EXT
INT -->|Update| WM
INT -->|Store| LTM
EXT -->|Observation| WM
EXT --> ENV
end
2.2 Memory System
CoALA divides LLM agent memory into three levels:
Working Memory
Corresponds to the LLM's context window:
| Component | Content |
|---|---|
| System prompt | Role definition, behavioral norms |
| Conversation history | User messages and assistant replies |
| Retrieved results | Relevant information from long-term memory |
| Tool output | Return results from tool calls |
| Intermediate reasoning | Thinking steps from CoT |
Working memory capacity is limited by the context window:
Long-Term Memory Subtypes
| Type | Stored Content | Implementation | Classic Counterpart |
|---|---|---|---|
| Episodic | Past interaction experiences | Vector DB + Temporal index | ACT-R Declarative Memory |
| Semantic | Facts and conceptual knowledge | Knowledge graphs / Document stores | ACT-R Declarative Memory |
| Procedural | Skills and methods | Code libraries / Tool definitions / Few-shot | SOAR Production Rules |
2.3 Decision Process
CoALA's decision process is a deliberation loop:
while task_not_complete:
# 1. Perception: acquire new information
observation = perceive(environment)
working_memory.update(observation)
# 2. Retrieval: fetch relevant information from long-term memory
retrieved = long_term_memory.retrieve(working_memory.query())
working_memory.update(retrieved)
# 3. Reasoning: LLM reasons based on working memory
thought, action = llm.reason(working_memory)
# 4. Action selection and execution
if action.type == "internal":
# Internal action: reasoning, memory update, learning
result = execute_internal(action)
elif action.type == "external":
# External action: tool calling, environment interaction
result = execute_external(action)
# 5. Update memory
working_memory.update(result)
long_term_memory.store(episode=(observation, thought, action, result))
2.4 Action Space
CoALA divides agent actions into internal and external categories:
Internal Actions (do not change the external environment):
- Reasoning: CoT, ToT, and other thinking processes
- Retrieval: Querying information from memory or knowledge bases
- Learning: Updating knowledge in long-term memory
External Actions (affect the external environment):
- Tool Use: Calling APIs, executing code
- Communication: Interacting with users or other agents
- Environment Operations (Grounding): Manipulating files, browsing the web
3. RAISE Architecture
RAISE (Reasoning, Acting, Interacting, Self-improving, Experience-learning) is another influential LLM cognitive architecture.
3.1 Five Core Modules
graph LR
R[Reasoning Module] --> A[Acting Module]
A --> I[Interacting Module]
I --> S[Self-Improving]
S --> E[Experience Learning]
E --> R
R -.-> |CoT/ToT| R
A -.-> |Tool Calling| A
I -.-> |Environment Feedback| I
S -.-> |Reflection| S
E -.-> |Memory Storage| E
| Module | Function | Key Techniques |
|---|---|---|
| Reasoning | Multi-step reasoning and planning | CoT, ToT, Self-Consistency |
| Acting | Executing actions and using tools | Function Calling, Code Execution |
| Interacting | Interacting with environment and users | Dialogue management, Feedback processing |
| Self-improving | Reflection and self-correction | Reflexion, Self-Refine |
| Experience | Experience accumulation and learning | Memory storage, Case libraries |
4. Mapping LLM to Cognitive Functions
4.1 Cognitive Function Mapping Table
| Cognitive Function | Classic Implementation | LLM Implementation |
|---|---|---|
| Perception | Sensors + Feature extraction | Multimodal input encoding (text/image/audio) |
| Attention | Feature selection | Self-Attention mechanism + Context window management |
| Working Memory | Limited-capacity buffer | Context window (limited but far exceeds human 7 plus or minus 2) |
| Long-Term Memory | Knowledge base / Semantic networks | Model parameters (implicit) + External storage (explicit) |
| Reasoning | Logical deduction / Search | Autoregressive generation + CoT |
| Planning | STRIPS / HTN | Task decomposition + Multi-step generation |
| Learning | Parameter updates / Rule acquisition | In-context Learning + Experience memory |
| Language | NLU/NLG pipeline | End-to-end language model (natural advantage) |
| Metacognition | Meta-reasoning rules | Self-reflection / Confidence estimation |
4.2 LLM as "System 1 + System 2"
Drawing on Kahneman's dual-process theory:
| Feature | System 1 (Fast Thinking) | System 2 (Slow Thinking) | LLM Implementation |
|---|---|---|---|
| Speed | Fast, automatic | Slow, controlled | Direct generation vs. CoT |
| Effort | Low cognitive load | High cognitive load | Short reply vs. Long reasoning chain |
| Awareness | Unconscious | Conscious | Implicit knowledge vs. Explicit thinking |
| Control | Hard to control | Deliberately controlled | Low Temperature vs. High |
LLM's "System 1": Direct answer generation (single forward pass)
LLM's "System 2": Deliberative reasoning through CoT
where \(t_1, \ldots, t_n\) are intermediate thinking steps.
Cross-Reference
For a detailed discussion of chain-of-thought, see Chain-of-Thought and Reasoning Patterns.
5. Modern LLM Agent Architecture Patterns
5.1 Standard ReAct Architecture
The most basic LLM agent architecture:
graph TD
U[User Input] --> LLM
LLM --> T[Thought<br/>Reasoning]
T --> A[Action<br/>Action Selection]
A --> TOOL[Tool Execution]
TOOL --> O[Observation<br/>Execution Result]
O --> LLM
LLM --> |Task Complete| R[Final Answer]
5.2 Memory-Augmented Architecture
graph TD
U[User Input] --> WM[Working Memory Assembly]
SYS[System Prompt] --> WM
MEM[Long-Term Memory Retrieval] --> WM
HIST[Conversation History] --> WM
WM --> LLM[LLM Reasoning]
LLM --> D{Decision}
D -->|Think| T[Internal Reasoning]
D -->|Act| A[Tool Calling]
D -->|Reflect| REF[Self-Evaluation]
D -->|Answer| R[Final Response]
T --> LLM
A --> O[Observation] --> LLM
REF --> MEM_W[Write to Memory]
MEM_W --> MEM
5.3 Anthropic's Agent Design Patterns
Patterns summarized by Anthropic (2024) in Building Effective Agents:
| Pattern | Description | Applicable Scenario |
|---|---|---|
| Augmented LLM | LLM + retrieval + tools, no loop | Simple tasks completable in one step |
| Prompt Chaining | Multiple LLM calls chained, each with clear I/O | Tasks decomposable into fixed steps |
| Routing | Route to different processing flows based on input type | Multiple input types with specialized processing |
| Parallelization | Multiple LLMs process subtasks simultaneously | Independent subtasks + result aggregation |
| Orchestrator-Worker | Central LLM assigns tasks to worker LLMs | Complex tasks requiring dynamic decomposition |
| Evaluator-Optimizer | One LLM generates, another evaluates | Generation tasks with clear quality criteria |
| Autonomous Loop | Agent loops autonomously until completion | Open-ended tasks requiring exploration |
6. Key Decisions in Designing LLM Cognitive Architectures
6.1 Memory Strategy
- What information goes into working memory? (Context window management)
- What information is stored in long-term memory? (Selective storage)
- How to retrieve? (Semantic search, temporal decay, importance weighting)
6.2 Reasoning Depth
- When to use "System 1" (direct answer)?
- When to use "System 2" (deep CoT reasoning)?
- When is tree search (ToT) needed?
- Trade-off between reasoning depth and computational cost
6.3 Action Granularity
- Atomic actions (single API call) vs. composite actions (predefined workflows)
- Size of the action space and selection difficulty
- Action validation and safety checks
6.4 Control Flow
- Fixed loop (ReAct loop) vs. dynamic planning (Plan-and-Execute)
- When to stop? (Termination condition design)
- How to recover from failure? (Retry, fallback, escalate)
7. Open Challenges
- Hallucination problem: LLM "beliefs" may be fabricated, lacking grounding
- Long-term consistency: Maintaining consistent "beliefs" and "intentions" across multi-turn conversations
- Learning efficiency: In-context learning is far less sample-efficient than human learning
- Metacognition: LLMs struggle to accurately assess what they "know" and "don't know"
- Scalable memory: How to manage ever-growing memory without losing critical information
- Multimodal integration: How to uniformly process text, images, code, and other modalities
References
- Sumers, T. et al. (2024). Cognitive Architectures for Language Agents. arXiv:2309.02427.
- Anthropic. (2024). Building Effective Agents. anthropic.com.
- Weng, L. (2023). LLM Powered Autonomous Agents. lilianweng.github.io.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
- Wang, L. et al. (2024). A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science.