Memory Streams and Reflection Mechanisms

Overview

The Memory Stream is the core memory architecture proposed by Park et al. (2023) in Generative Agents. It provides virtual embodied agents with a complete record of experiences and achieves efficient memory utilization through tri-factor retrieval scoring and reflection mechanisms.

Memory Stream Architecture

The memory stream is a temporally ordered list of memory objects, each containing:

class MemoryObject:
    """A single memory entry in the memory stream"""
    def __init__(self):
        self.description: str      # Natural language description
        self.creation_time: float  # Creation timestamp
        self.last_access: float    # Last access time
        self.importance: int       # Importance score (1-10)
        self.embedding: list       # Semantic embedding vector
        self.type: str             # "observation" | "reflection" | "plan"
        self.related_ids: list     # List of related memory IDs

graph TD
    subgraph Memory Stream
        M1[Observation: Saw John at the library<br/>t=8:00, imp=2]
        M2[Observation: Discussed project with Maria<br/>t=9:30, imp=5]
        M3[Observation: Heard about the party<br/>t=10:15, imp=6]
        M4[Reflection: John has been at the library often lately<br/>t=10:30, imp=7]
        M5[Observation: Received meeting invitation<br/>t=11:00, imp=4]
        M6[Reflection: Should invite Maria to the party<br/>t=11:30, imp=8]
    end

    M1 --> M4
    M3 --> M4
    M3 --> M6
    M2 --> M6

Tri-Factor Retrieval Scoring

When the agent needs to retrieve relevant memories from the memory stream, it uses a weighted combination of three factors:

\[\text{score}(m, q) = \alpha \cdot \text{recency}(m) + \beta \cdot \text{importance}(m) + \gamma \cdot \text{relevance}(q, m)\]

where: - \(m\) is the memory object - \(q\) is the current query (current context) - \(\alpha, \beta, \gamma\) are tunable weight hyperparameters

Factor 1: Recency

Recency scoring uses an exponential decay function, with more recent memories scoring higher:

\[\text{recency}(m) = e^{-\lambda \Delta t}\]

where: - \(\Delta t = t_{\text{now}} - t_{\text{last\_access}}(m)\) is the time since last access - \(\lambda\) is the decay rate parameter

import math

def recency_score(memory, current_time, decay_rate=0.995):
    """Compute recency score

    Args:
        memory: Memory object
        current_time: Current time (hours)
        decay_rate: Decay rate; smaller values mean faster decay

    Returns:
        Recency score between 0 and 1
    """
    hours_passed = current_time - memory.last_access
    return math.exp(-decay_rate * hours_passed)

The choice of decay rate \(\lambda\) affects the memory's "shelf life":

\(\lambda\)	Half-life	Use Case
0.01	~69 hours	Long-term social relationships
0.05	~14 hours	Daily events
0.1	~7 hours	Immediate conversation
0.5	~1.4 hours	Short-term tasks

Factor 2: Importance

The importance score is evaluated once by the LLM at memory creation time:

\[\text{importance}(m) \in [1, 10]\]

Scoring Prompt:

On the scale of 1 to 10, where 1 is purely mundane (e.g., brushing 
teeth, making bed) and 10 is extremely poignant (e.g., a break up, 
college acceptance), rate the likely poignancy of the following 
piece of memory.

Memory: {memory_description}
Rating: <fill in>

Importance scores are normalized to \([0, 1]\):

\[\text{importance\_norm}(m) = \frac{\text{importance}(m) - 1}{9}\]

Importance Score Examples

"Brushing teeth" -> 1 (completely mundane)
"Seeing a colleague at a coffee shop" -> 3 (mild social event)
"Learning a close friend is getting married" -> 8 (significant social event)
"Being told about losing a job" -> 10 (major life event)

Factor 3: Relevance

Relevance is computed via cosine similarity of semantic embeddings:

\[\text{relevance}(q, m) = \frac{\mathbf{e}_q \cdot \mathbf{e}_m}{\|\mathbf{e}_q\| \cdot \|\mathbf{e}_m\|}\]

where \(\mathbf{e}_q\) and \(\mathbf{e}_m\) are the embedding vectors of the query and memory, respectively.

import numpy as np

def relevance_score(query_embedding, memory_embedding):
    """Compute relevance score (cosine similarity)"""
    dot_product = np.dot(query_embedding, memory_embedding)
    norm_product = np.linalg.norm(query_embedding) * np.linalg.norm(memory_embedding)
    if norm_product == 0:
        return 0.0
    return dot_product / norm_product

Complete Retrieval Process

def retrieve_memories(memory_stream, query, current_time, top_k=10,
                      alpha=1.0, beta=1.0, gamma=1.0, decay_rate=0.995):
    """Retrieve the most relevant memories from the memory stream

    Tri-factor weighted retrieval: recency + importance + relevance
    """
    query_embedding = get_embedding(query)
    scored_memories = []

    for memory in memory_stream:
        rec = recency_score(memory, current_time, decay_rate)
        imp = (memory.importance - 1) / 9.0  # Normalize to [0,1]
        rel = relevance_score(query_embedding, memory.embedding)

        total = alpha * rec + beta * imp + gamma * rel
        scored_memories.append((memory, total))

    # Sort by total score, return top_k
    scored_memories.sort(key=lambda x: x[1], reverse=True)

    # Update access time
    for memory, _ in scored_memories[:top_k]:
        memory.last_access = current_time

    return [m for m, _ in scored_memories[:top_k]]

Reflection Mechanism

Reflection is one of the most critical innovations in the memory stream architecture. Agents not only store raw observations but can synthesize higher-level abstract insights from multiple memories.

Reflection Trigger Condition

Reflection is triggered when the sum of importance scores of recent memories exceeds a threshold:

\[\sum_{m \in \text{recent}} \text{importance}(m) \geq \theta_{\text{reflect}}\]

Typically \(\theta_{\text{reflect}} = 150\) (i.e., reflection triggers when cumulative importance reaches 150).

Reflection Generation Process

graph TD
    A[Trigger Reflection] --> B[Determine Reflection Topics]
    B --> C[Retrieve Related Memories<br/>top 100]
    C --> D[LLM Generates High-Level Insights]
    D --> E[Create Reflection Memory Object]
    E --> F[Add to Memory Stream]
    F --> G[Reset Importance Accumulator]

    B -->|Prompt| B1["Given only the information above,<br/>what are 3 most salient high-level<br/>questions we can answer?"]
    D -->|Prompt| D1["What 5 high-level insights can you<br/>infer from the above statements?"]

Reflection Hierarchy

Reflection can be performed recursively, forming multiple levels of abstraction:

\[\text{Level 0: Observations} \rightarrow \text{Level 1: Reflections} \rightarrow \text{Level 2: Meta-reflections} \rightarrow \cdots\]

Level 0 (Observation): "Klaus is painting oil paintings in the studio"
Level 1 (Reflection): "Klaus is passionate about art"
Level 2 (Meta-reflection): "Klaus is a person whose core identity revolves around creativity"

def maybe_reflect(agent, threshold=150):
    """Check whether reflection should be triggered"""
    recent_importance_sum = sum(
        m.importance for m in agent.memory_stream
        if m.creation_time > agent.last_reflection_time
    )

    if recent_importance_sum >= threshold:
        # Generate reflection topics
        recent_memories = get_recent_memories(agent, n=100)
        topics = generate_reflection_topics(recent_memories)

        for topic in topics:
            # Retrieve memories related to the topic
            relevant = retrieve_memories(agent.memory_stream, topic, 
                                        current_time=agent.current_time)
            # Generate reflective insights
            insights = generate_insights(relevant, topic)

            for insight in insights:
                # Create reflection memory object
                reflection = MemoryObject()
                reflection.description = insight
                reflection.type = "reflection"
                reflection.importance = rate_importance(insight)
                reflection.creation_time = agent.current_time
                reflection.related_ids = [m.id for m in relevant]

                agent.memory_stream.append(reflection)

        agent.last_reflection_time = agent.current_time

Ablation Study Results

Park et al. conducted systematic ablation studies on the tri-factor retrieval and reflection mechanisms:

Retrieval Factor Ablation

Configuration	Behavior Plausibility Score	Notes
Full model (tri-factor + reflection)	8.4 / 10	Baseline
Remove recency	7.2 / 10	Agent repeatedly mentions outdated information
Remove importance	7.5 / 10	Cannot distinguish trivia from significant events
Remove relevance	6.8 / 10	Retrieves irrelevant memories
Recency only	5.9 / 10	Only remembers recent things
Relevance only	6.3 / 10	No temporal awareness

Reflection Mechanism Ablation

Configuration	Behavior Plausibility Score	Key Observation
With reflection	8.4 / 10	Agent demonstrates deep understanding
Without reflection	6.1 / 10	Behavior remains at surface-level reactions
Without reflection + without planning	4.8 / 10	Behavior is nearly random

Key Finding

Removing the reflection mechanism caused the largest performance drop (-2.3 points), indicating that higher-level abstraction ability is crucial for credible agent behavior. Removing the relevance factor had the largest impact among the three retrieval factors (-1.6 points).

Engineering Optimizations for Memory Streams

Vector Indexing

When the memory stream grows to thousands of memories, brute-force retrieval becomes inefficient:

# Using FAISS for approximate nearest neighbor search
import faiss

class OptimizedMemoryStream:
    def __init__(self, embedding_dim=1536):
        self.index = faiss.IndexFlatIP(embedding_dim)  # Inner product index
        self.memories = []

    def add_memory(self, memory):
        self.memories.append(memory)
        embedding = np.array([memory.embedding], dtype='float32')
        faiss.normalize_L2(embedding)  # After normalization, inner product = cosine similarity
        self.index.add(embedding)

    def search_relevant(self, query_embedding, top_k=50):
        query = np.array([query_embedding], dtype='float32')
        faiss.normalize_L2(query)
        scores, indices = self.index.search(query, top_k)
        return [(self.memories[i], scores[0][j]) 
                for j, i in enumerate(indices[0])]

Memory Compression

Long-running agents need memory compression strategies:

\[\text{compress}(M) = \text{summarize}(\{m \in M : \text{importance}(m) < \theta_{\text{prune}}\})\]

Forgetting: Low importance + low access frequency memories are deleted
Merging: Similar low-level memories are merged into summaries
Tiering: Old memories move to "long-term storage" with reduced retrieval frequency

Importance Score Caching

# Batch evaluate importance to reduce LLM calls
def batch_rate_importance(descriptions, batch_size=10):
    """Batch evaluate memory importance to reduce API calls"""
    results = []
    for i in range(0, len(descriptions), batch_size):
        batch = descriptions[i:i+batch_size]
        prompt = format_batch_importance_prompt(batch)
        ratings = call_llm(prompt)
        results.extend(parse_ratings(ratings))
    return results

Comparison with Other Memory Systems

Feature	Memory Stream (Park)	RAG	MemGPT	Traditional DB
Temporal awareness	Exponential decay	None	Limited	Queryable
Importance filtering	LLM scoring	None	Tiered management	Manual tagging
Semantic retrieval	Embedding similarity	Embedding similarity	Embedding similarity	SQL queries
Abstraction ability	Reflection mechanism	None	Limited	None
Scalability	Medium	High	Medium	High

Summary

Core contributions of the memory stream and reflection mechanisms:

Tri-factor retrieval provides a more human-like memory access pattern than pure semantic retrieval
Reflection mechanism enables agents to extract high-level insights from experience rather than remaining at the surface
Ablation studies validate the necessity of each component, especially the critical role of reflection
This architecture established the memory system design paradigm for subsequent virtual embodied agent research