Skip to content

LLM Cognitive Architecture

Overview

The emergence of Large Language Models (LLMs) has given rise to an entirely new cognitive architecture paradigm: using the LLM as the core "cognitive engine," combined with structured memory, tools, and control flow to build agents. This article provides an in-depth analysis of the CoALA (Cognitive Architectures for Language Agents) framework, the RAISE architecture, and how LLMs map to classical cognitive functions.


1. Why Do We Need LLM Cognitive Architectures?

An LLM by itself is merely a "text-to-text" function:

\[ f_{\theta}: \text{String} \rightarrow \text{String} \]

To turn an LLM into an agent, we need to answer:

  1. Memory: How to extend the limited context window?
  2. Reasoning: How to achieve multi-step, backtrackable reasoning?
  3. Action: How to interact with the external world?
  4. Learning: How to improve from experience?
  5. Control: How to coordinate these components?

Cognitive architectures provide systematic answers to these questions.


2. The CoALA Framework

CoALA (Cognitive Architectures for Language Agents), proposed by Sumers et al. (2024), is currently the most complete cognitive architecture framework for LLM agents.

2.1 Architecture Overview

graph TD
    subgraph CoALA Framework
        subgraph Memory System
            WM[Working Memory]
            LTM[Long-Term Memory]
            LTM --> EP[Episodic Memory]
            LTM --> SEM[Semantic Memory]
            LTM --> PROC[Procedural Memory]
        end

        subgraph Decision Process
            LLM_CORE[LLM Core<br/>Reasoning Engine]
            DL[Deliberation Loop]
        end

        subgraph Action Space
            INT[Internal Actions<br/>Reasoning / Retrieval / Learning]
            EXT[External Actions<br/>Tool Calling / Environment Interaction]
        end

        ENV[External Environment] -->|Perception| WM
        WM --> LLM_CORE
        LTM --> WM
        LLM_CORE --> DL
        DL --> INT
        DL --> EXT
        INT -->|Update| WM
        INT -->|Store| LTM
        EXT -->|Observation| WM
        EXT --> ENV
    end

2.2 Memory System

CoALA divides LLM agent memory into three levels:

Working Memory

Corresponds to the LLM's context window:

Component Content
System prompt Role definition, behavioral norms
Conversation history User messages and assistant replies
Retrieved results Relevant information from long-term memory
Tool output Return results from tool calls
Intermediate reasoning Thinking steps from CoT

Working memory capacity is limited by the context window:

\[ |\text{WM}| \leq L_{\text{context}} \quad (\text{e.g., 128K tokens}) \]

Long-Term Memory Subtypes

Type Stored Content Implementation Classic Counterpart
Episodic Past interaction experiences Vector DB + Temporal index ACT-R Declarative Memory
Semantic Facts and conceptual knowledge Knowledge graphs / Document stores ACT-R Declarative Memory
Procedural Skills and methods Code libraries / Tool definitions / Few-shot SOAR Production Rules

2.3 Decision Process

CoALA's decision process is a deliberation loop:

while task_not_complete:
    # 1. Perception: acquire new information
    observation = perceive(environment)
    working_memory.update(observation)

    # 2. Retrieval: fetch relevant information from long-term memory
    retrieved = long_term_memory.retrieve(working_memory.query())
    working_memory.update(retrieved)

    # 3. Reasoning: LLM reasons based on working memory
    thought, action = llm.reason(working_memory)

    # 4. Action selection and execution
    if action.type == "internal":
        # Internal action: reasoning, memory update, learning
        result = execute_internal(action)
    elif action.type == "external":
        # External action: tool calling, environment interaction
        result = execute_external(action)

    # 5. Update memory
    working_memory.update(result)
    long_term_memory.store(episode=(observation, thought, action, result))

2.4 Action Space

CoALA divides agent actions into internal and external categories:

Internal Actions (do not change the external environment):

  • Reasoning: CoT, ToT, and other thinking processes
  • Retrieval: Querying information from memory or knowledge bases
  • Learning: Updating knowledge in long-term memory

External Actions (affect the external environment):

  • Tool Use: Calling APIs, executing code
  • Communication: Interacting with users or other agents
  • Environment Operations (Grounding): Manipulating files, browsing the web

3. RAISE Architecture

RAISE (Reasoning, Acting, Interacting, Self-improving, Experience-learning) is another influential LLM cognitive architecture.

3.1 Five Core Modules

graph LR
    R[Reasoning Module] --> A[Acting Module]
    A --> I[Interacting Module]
    I --> S[Self-Improving]
    S --> E[Experience Learning]
    E --> R

    R -.-> |CoT/ToT| R
    A -.-> |Tool Calling| A
    I -.-> |Environment Feedback| I
    S -.-> |Reflection| S
    E -.-> |Memory Storage| E
Module Function Key Techniques
Reasoning Multi-step reasoning and planning CoT, ToT, Self-Consistency
Acting Executing actions and using tools Function Calling, Code Execution
Interacting Interacting with environment and users Dialogue management, Feedback processing
Self-improving Reflection and self-correction Reflexion, Self-Refine
Experience Experience accumulation and learning Memory storage, Case libraries

4. Mapping LLM to Cognitive Functions

4.1 Cognitive Function Mapping Table

Cognitive Function Classic Implementation LLM Implementation
Perception Sensors + Feature extraction Multimodal input encoding (text/image/audio)
Attention Feature selection Self-Attention mechanism + Context window management
Working Memory Limited-capacity buffer Context window (limited but far exceeds human 7 plus or minus 2)
Long-Term Memory Knowledge base / Semantic networks Model parameters (implicit) + External storage (explicit)
Reasoning Logical deduction / Search Autoregressive generation + CoT
Planning STRIPS / HTN Task decomposition + Multi-step generation
Learning Parameter updates / Rule acquisition In-context Learning + Experience memory
Language NLU/NLG pipeline End-to-end language model (natural advantage)
Metacognition Meta-reasoning rules Self-reflection / Confidence estimation

4.2 LLM as "System 1 + System 2"

Drawing on Kahneman's dual-process theory:

Feature System 1 (Fast Thinking) System 2 (Slow Thinking) LLM Implementation
Speed Fast, automatic Slow, controlled Direct generation vs. CoT
Effort Low cognitive load High cognitive load Short reply vs. Long reasoning chain
Awareness Unconscious Conscious Implicit knowledge vs. Explicit thinking
Control Hard to control Deliberately controlled Low Temperature vs. High

LLM's "System 1": Direct answer generation (single forward pass)

\[ y = \arg\max P(y \mid x; \theta) \]

LLM's "System 2": Deliberative reasoning through CoT

\[ y = \arg\max P(y \mid x, t_1, t_2, \ldots, t_n; \theta) \]

where \(t_1, \ldots, t_n\) are intermediate thinking steps.

Cross-Reference

For a detailed discussion of chain-of-thought, see Chain-of-Thought and Reasoning Patterns.


5. Modern LLM Agent Architecture Patterns

5.1 Standard ReAct Architecture

The most basic LLM agent architecture:

graph TD
    U[User Input] --> LLM
    LLM --> T[Thought<br/>Reasoning]
    T --> A[Action<br/>Action Selection]
    A --> TOOL[Tool Execution]
    TOOL --> O[Observation<br/>Execution Result]
    O --> LLM
    LLM --> |Task Complete| R[Final Answer]

5.2 Memory-Augmented Architecture

graph TD
    U[User Input] --> WM[Working Memory Assembly]
    SYS[System Prompt] --> WM
    MEM[Long-Term Memory Retrieval] --> WM
    HIST[Conversation History] --> WM
    WM --> LLM[LLM Reasoning]
    LLM --> D{Decision}
    D -->|Think| T[Internal Reasoning]
    D -->|Act| A[Tool Calling]
    D -->|Reflect| REF[Self-Evaluation]
    D -->|Answer| R[Final Response]
    T --> LLM
    A --> O[Observation] --> LLM
    REF --> MEM_W[Write to Memory]
    MEM_W --> MEM

5.3 Anthropic's Agent Design Patterns

Patterns summarized by Anthropic (2024) in Building Effective Agents:

Pattern Description Applicable Scenario
Augmented LLM LLM + retrieval + tools, no loop Simple tasks completable in one step
Prompt Chaining Multiple LLM calls chained, each with clear I/O Tasks decomposable into fixed steps
Routing Route to different processing flows based on input type Multiple input types with specialized processing
Parallelization Multiple LLMs process subtasks simultaneously Independent subtasks + result aggregation
Orchestrator-Worker Central LLM assigns tasks to worker LLMs Complex tasks requiring dynamic decomposition
Evaluator-Optimizer One LLM generates, another evaluates Generation tasks with clear quality criteria
Autonomous Loop Agent loops autonomously until completion Open-ended tasks requiring exploration

6. Key Decisions in Designing LLM Cognitive Architectures

6.1 Memory Strategy

\[ \text{Memory Utility} = \text{Relevance}(q, m) \times \text{Recency}(t_m) \times \text{Importance}(m) \]
  • What information goes into working memory? (Context window management)
  • What information is stored in long-term memory? (Selective storage)
  • How to retrieve? (Semantic search, temporal decay, importance weighting)

6.2 Reasoning Depth

  • When to use "System 1" (direct answer)?
  • When to use "System 2" (deep CoT reasoning)?
  • When is tree search (ToT) needed?
  • Trade-off between reasoning depth and computational cost

6.3 Action Granularity

  • Atomic actions (single API call) vs. composite actions (predefined workflows)
  • Size of the action space and selection difficulty
  • Action validation and safety checks

6.4 Control Flow

  • Fixed loop (ReAct loop) vs. dynamic planning (Plan-and-Execute)
  • When to stop? (Termination condition design)
  • How to recover from failure? (Retry, fallback, escalate)

7. Open Challenges

  1. Hallucination problem: LLM "beliefs" may be fabricated, lacking grounding
  2. Long-term consistency: Maintaining consistent "beliefs" and "intentions" across multi-turn conversations
  3. Learning efficiency: In-context learning is far less sample-efficient than human learning
  4. Metacognition: LLMs struggle to accurately assess what they "know" and "don't know"
  5. Scalable memory: How to manage ever-growing memory without losing critical information
  6. Multimodal integration: How to uniformly process text, images, code, and other modalities

References

  1. Sumers, T. et al. (2024). Cognitive Architectures for Language Agents. arXiv:2309.02427.
  2. Anthropic. (2024). Building Effective Agents. anthropic.com.
  3. Weng, L. (2023). LLM Powered Autonomous Agents. lilianweng.github.io.
  4. Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  5. Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
  6. Wang, L. et al. (2024). A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science.

评论 #