LLM Cognitive Architecture

Overview

The emergence of Large Language Models (LLMs) has given rise to an entirely new cognitive architecture paradigm: using the LLM as the core "cognitive engine," combined with structured memory, tools, and control flow to build agents. This article provides an in-depth analysis of the CoALA (Cognitive Architectures for Language Agents) framework, the RAISE architecture, and how LLMs map to classical cognitive functions.

1. Why Do We Need LLM Cognitive Architectures?

An LLM by itself is merely a "text-to-text" function:

\[ f_{\theta}: \text{String} \rightarrow \text{String} \]

To turn an LLM into an agent, we need to answer:

Memory: How to extend the limited context window?
Reasoning: How to achieve multi-step, backtrackable reasoning?
Action: How to interact with the external world?
Learning: How to improve from experience?
Control: How to coordinate these components?

Cognitive architectures provide systematic answers to these questions.

2. The CoALA Framework

CoALA (Cognitive Architectures for Language Agents), proposed by Sumers et al. (2024), is currently the most complete cognitive architecture framework for LLM agents.

2.1 Architecture Overview

graph TD
    subgraph CoALA Framework
        subgraph Memory System
            WM[Working Memory]
            LTM[Long-Term Memory]
            LTM --> EP[Episodic Memory]
            LTM --> SEM[Semantic Memory]
            LTM --> PROC[Procedural Memory]
        end

        subgraph Decision Process
            LLM_CORE[LLM Core<br/>Reasoning Engine]
            DL[Deliberation Loop]
        end

        subgraph Action Space
            INT[Internal Actions<br/>Reasoning / Retrieval / Learning]
            EXT[External Actions<br/>Tool Calling / Environment Interaction]
        end

        ENV[External Environment] -->|Perception| WM
        WM --> LLM_CORE
        LTM --> WM
        LLM_CORE --> DL
        DL --> INT
        DL --> EXT
        INT -->|Update| WM
        INT -->|Store| LTM
        EXT -->|Observation| WM
        EXT --> ENV
    end

2.2 Memory System

CoALA divides LLM agent memory into three levels:

Working Memory

Corresponds to the LLM's context window:

Component	Content
System prompt	Role definition, behavioral norms
Conversation history	User messages and assistant replies
Retrieved results	Relevant information from long-term memory
Tool output	Return results from tool calls
Intermediate reasoning	Thinking steps from CoT

Working memory capacity is limited by the context window:

\[ |\text{WM}| \leq L_{\text{context}} \quad (\text{e.g., 128K tokens}) \]

Long-Term Memory Subtypes

Type	Stored Content	Implementation	Classic Counterpart
Episodic	Past interaction experiences	Vector DB + Temporal index	ACT-R Declarative Memory
Semantic	Facts and conceptual knowledge	Knowledge graphs / Document stores	ACT-R Declarative Memory
Procedural	Skills and methods	Code libraries / Tool definitions / Few-shot	SOAR Production Rules

2.3 Decision Process

CoALA's decision process is a deliberation loop:

while task_not_complete:
    # 1. Perception: acquire new information
    observation = perceive(environment)
    working_memory.update(observation)

    # 2. Retrieval: fetch relevant information from long-term memory
    retrieved = long_term_memory.retrieve(working_memory.query())
    working_memory.update(retrieved)

    # 3. Reasoning: LLM reasons based on working memory
    thought, action = llm.reason(working_memory)

    # 4. Action selection and execution
    if action.type == "internal":
        # Internal action: reasoning, memory update, learning
        result = execute_internal(action)
    elif action.type == "external":
        # External action: tool calling, environment interaction
        result = execute_external(action)

    # 5. Update memory
    working_memory.update(result)
    long_term_memory.store(episode=(observation, thought, action, result))

2.4 Action Space

CoALA divides agent actions into internal and external categories:

Internal Actions (do not change the external environment):

Reasoning: CoT, ToT, and other thinking processes
Retrieval: Querying information from memory or knowledge bases
Learning: Updating knowledge in long-term memory

External Actions (affect the external environment):

Tool Use: Calling APIs, executing code
Communication: Interacting with users or other agents
Environment Operations (Grounding): Manipulating files, browsing the web

3. RAISE Architecture

RAISE (Reasoning, Acting, Interacting, Self-improving, Experience-learning) is another influential LLM cognitive architecture.

3.1 Five Core Modules

graph LR
    R[Reasoning Module] --> A[Acting Module]
    A --> I[Interacting Module]
    I --> S[Self-Improving]
    S --> E[Experience Learning]
    E --> R

    R -.-> |CoT/ToT| R
    A -.-> |Tool Calling| A
    I -.-> |Environment Feedback| I
    S -.-> |Reflection| S
    E -.-> |Memory Storage| E

Module	Function	Key Techniques
Reasoning	Multi-step reasoning and planning	CoT, ToT, Self-Consistency
Acting	Executing actions and using tools	Function Calling, Code Execution
Interacting	Interacting with environment and users	Dialogue management, Feedback processing
Self-improving	Reflection and self-correction	Reflexion, Self-Refine
Experience	Experience accumulation and learning	Memory storage, Case libraries

4. Mapping LLM to Cognitive Functions

4.1 Cognitive Function Mapping Table

Cognitive Function	Classic Implementation	LLM Implementation
Perception	Sensors + Feature extraction	Multimodal input encoding (text/image/audio)
Attention	Feature selection	Self-Attention mechanism + Context window management
Working Memory	Limited-capacity buffer	Context window (limited but far exceeds human 7 plus or minus 2)
Long-Term Memory	Knowledge base / Semantic networks	Model parameters (implicit) + External storage (explicit)
Reasoning	Logical deduction / Search	Autoregressive generation + CoT
Planning	STRIPS / HTN	Task decomposition + Multi-step generation
Learning	Parameter updates / Rule acquisition	In-context Learning + Experience memory
Language	NLU/NLG pipeline	End-to-end language model (natural advantage)
Metacognition	Meta-reasoning rules	Self-reflection / Confidence estimation

4.2 LLM as "System 1 + System 2"

Drawing on Kahneman's dual-process theory:

Feature	System 1 (Fast Thinking)	System 2 (Slow Thinking)	LLM Implementation
Speed	Fast, automatic	Slow, controlled	Direct generation vs. CoT
Effort	Low cognitive load	High cognitive load	Short reply vs. Long reasoning chain
Awareness	Unconscious	Conscious	Implicit knowledge vs. Explicit thinking
Control	Hard to control	Deliberately controlled	Low Temperature vs. High

LLM's "System 1": Direct answer generation (single forward pass)

\[ y = \arg\max P(y \mid x; \theta) \]

LLM's "System 2": Deliberative reasoning through CoT

\[ y = \arg\max P(y \mid x, t_1, t_2, \ldots, t_n; \theta) \]

where \(t_1, \ldots, t_n\) are intermediate thinking steps.

Cross-Reference

For a detailed discussion of chain-of-thought, see Chain-of-Thought and Reasoning Patterns.

5. Modern LLM Agent Architecture Patterns

5.1 Standard ReAct Architecture

The most basic LLM agent architecture:

graph TD
    U[User Input] --> LLM
    LLM --> T[Thought<br/>Reasoning]
    T --> A[Action<br/>Action Selection]
    A --> TOOL[Tool Execution]
    TOOL --> O[Observation<br/>Execution Result]
    O --> LLM
    LLM --> |Task Complete| R[Final Answer]

5.2 Memory-Augmented Architecture

graph TD
    U[User Input] --> WM[Working Memory Assembly]
    SYS[System Prompt] --> WM
    MEM[Long-Term Memory Retrieval] --> WM
    HIST[Conversation History] --> WM
    WM --> LLM[LLM Reasoning]
    LLM --> D{Decision}
    D -->|Think| T[Internal Reasoning]
    D -->|Act| A[Tool Calling]
    D -->|Reflect| REF[Self-Evaluation]
    D -->|Answer| R[Final Response]
    T --> LLM
    A --> O[Observation] --> LLM
    REF --> MEM_W[Write to Memory]
    MEM_W --> MEM

5.3 Anthropic's Agent Design Patterns

Patterns summarized by Anthropic (2024) in Building Effective Agents:

Pattern	Description	Applicable Scenario
Augmented LLM	LLM + retrieval + tools, no loop	Simple tasks completable in one step
Prompt Chaining	Multiple LLM calls chained, each with clear I/O	Tasks decomposable into fixed steps
Routing	Route to different processing flows based on input type	Multiple input types with specialized processing
Parallelization	Multiple LLMs process subtasks simultaneously	Independent subtasks + result aggregation
Orchestrator-Worker	Central LLM assigns tasks to worker LLMs	Complex tasks requiring dynamic decomposition
Evaluator-Optimizer	One LLM generates, another evaluates	Generation tasks with clear quality criteria
Autonomous Loop	Agent loops autonomously until completion	Open-ended tasks requiring exploration

6. Key Decisions in Designing LLM Cognitive Architectures

6.1 Memory Strategy

\[ \text{Memory Utility} = \text{Relevance}(q, m) \times \text{Recency}(t_m) \times \text{Importance}(m) \]

What information goes into working memory? (Context window management)
What information is stored in long-term memory? (Selective storage)
How to retrieve? (Semantic search, temporal decay, importance weighting)

6.2 Reasoning Depth

When to use "System 1" (direct answer)?
When to use "System 2" (deep CoT reasoning)?
When is tree search (ToT) needed?
Trade-off between reasoning depth and computational cost

6.3 Action Granularity

Atomic actions (single API call) vs. composite actions (predefined workflows)
Size of the action space and selection difficulty
Action validation and safety checks

6.4 Control Flow

Fixed loop (ReAct loop) vs. dynamic planning (Plan-and-Execute)
When to stop? (Termination condition design)
How to recover from failure? (Retry, fallback, escalate)

7. Open Challenges

Hallucination problem: LLM "beliefs" may be fabricated, lacking grounding
Long-term consistency: Maintaining consistent "beliefs" and "intentions" across multi-turn conversations
Learning efficiency: In-context learning is far less sample-efficient than human learning
Metacognition: LLMs struggle to accurately assess what they "know" and "don't know"
Scalable memory: How to manage ever-growing memory without losing critical information
Multimodal integration: How to uniformly process text, images, code, and other modalities

References

Sumers, T. et al. (2024). Cognitive Architectures for Language Agents. arXiv:2309.02427.
Anthropic. (2024). Building Effective Agents. anthropic.com.
Weng, L. (2023). LLM Powered Autonomous Agents. lilianweng.github.io.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
Wang, L. et al. (2024). A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science.