Skip to content

Agent Milestones

Overview

This article reviews the technological milestones of agents from the 1960s to the present. Each system achieved a breakthrough in a particular dimension. By analyzing these milestones, we can trace the evolution of agent technology from rule-based systems to learning systems, and from specialized to general-purpose.


Milestone Timeline

timeline
    title Agent Technology Milestones
    section Conversation & Understanding
        1966 ELIZA : First conversational program
        1971 SHRDLU : Language understanding in restricted worlds
        2011 Siri : Mainstream voice assistant
        2022 ChatGPT : LLM conversational agent
    section Knowledge & Reasoning
        1972 MYCIN : Medical expert system
        1987 BDI : Rational agent theory
        1991 SOAR : General cognitive architecture
        2023 CoT/ReAct : LLM reasoning + action
    section Planning & Action
        1969 Shakey : Autonomous mobile robot
        1997 Deep Blue : Game tree search
        2016 AlphaGo : Deep RL decision-making
        2023 Voyager : Open-world exploration
    section Autonomous Agents
        2023 AutoGPT : Autonomous task execution
        2023 BabyAGI : Task-driven agent
        2024 Claude Code : Programming agent
        2024 Devin : Autonomous software engineer
        2025 Operator : Browser agent

1. ELIZA (1966) -- The Illusion of Conversation

Developer: Joseph Weizenbaum, MIT

Technical Approach: A dialogue system based on pattern matching and substitution rules, simulating a Rogerian psychotherapist.

Breakthrough Significance:

  • First demonstrated that humans project emotions onto machine conversations (the ELIZA effect)
  • Revealed the philosophical gap between "understanding" and "appearing to understand"
  • Pioneered human-computer dialogue research

Core Mechanism:

User Input → Keyword Matching → Template Filling → Response Output

Subsequent Impact: The spiritual predecessor of all dialogue systems, from Alexa to ChatGPT.


2. MYCIN (1972) -- The Power of Knowledge

Developer: Edward Shortliffe, Stanford

Technical Approach: A medical diagnosis expert system based on ~600 production rules, using Certainty Factors to handle uncertainty.

Certainty Factor Computation:

\[ CF(H, E) = MB(H, E) - MD(H, E) \]

where \(MB\) is the Measure of Belief and \(MD\) is the Measure of Disbelief.

Breakthrough Significance:

  • First AI system to reach expert-level performance in a specific domain
  • Proved that knowledge (rather than reasoning algorithms) is the key to intelligence
  • Introduced a practical framework for reasoning under uncertainty

3. Shakey (1969) -- Unifying Thought and Action

Developer: SRI International

Technical Approach: A mobile robot integrating visual perception, symbolic reasoning (STRIPS planner), and motor control.

Breakthrough Significance:

  • First physical agent to integrate perception-planning-execution into a complete loop
  • The STRIPS planning representation remains foundational in planning research today
  • Demonstrated that symbolic reasoning can drive physical actions

4. BDI Model (1987) -- Formalizing Rationality

Proposer: Michael Bratman, Stanford

Technical Approach: A formal model of rational agents based on intentionality philosophy.

Formal Representation:

\[ \text{Agent} = \langle B, D, I, \text{Plan Library} \rangle \]
  • \(B\): Belief set (information about world states)
  • \(D\): Desire set (goals the agent hopes to achieve)
  • \(I\): Intention set (committed plans of action)

Breakthrough Significance:

  • Provided a rigorous philosophical and computational foundation for agent "mental states"
  • Gave rise to practical systems such as PRS, AgentSpeak(L), and Jason
  • The BDI loop remains a core pattern in agent design today

Cross-Reference

For the complete formalization and implementation of the BDI model, see BDI Model.


5. SOAR (1991) -- General Cognitive Architecture

Developers: John Laird, Allen Newell, Paul Rosenbloom

Technical Approach: A general cognitive architecture based on problem-space search, with learning through chunking.

Breakthrough Significance:

  • First cognitive architecture aimed at achieving general intelligence
  • Unified problem solving, learning, and knowledge representation
  • Has been in continuous development for over 30 years, influencing all subsequent cognitive architecture research

Developer: IBM

Technical Approach: Specialized hardware + Alpha-Beta pruning + opening book + endgame database, evaluating 200 million chess positions per second.

Breakthrough Significance:

  • First to defeat a human world champion (Garry Kasparov) in an intellectual competition
  • Demonstrated the power of brute-force search combined with domain knowledge
  • Critics argued this was not "true intelligence"

7. AlphaGo (2016) -- Learning Surpasses Knowledge

Developer: DeepMind

Technical Approach: Deep neural networks (policy network + value network) + Monte Carlo Tree Search (MCTS) + self-play reinforcement learning.

MCTS Action Selection:

\[ a^* = \arg\max_a \left[ Q(s, a) + c_{puct} \cdot P(s, a) \cdot \frac{\sqrt{N(s)}}{1 + N(s, a)} \right] \]

Breakthrough Significance:

  • Achieved superhuman performance in Go (\(10^{170}\) state space), previously thought to require 20 more years
  • Demonstrated superhuman capabilities of deep RL in complex decision tasks
  • AlphaZero (2017) further proved that pure self-play without human knowledge can achieve superhuman level

8. GPT-3 (2020) -- Language as Intelligence

Developer: OpenAI

Technical Approach: A 175B-parameter autoregressive language model demonstrating powerful few-shot learning capabilities.

Breakthrough Significance:

  • Proved that sufficiently large language models can perform diverse tasks without fine-tuning
  • In-context learning ushered in the prompt engineering era
  • Laid the foundation for subsequent LLM agents (ChatGPT, AutoGPT)

9. AutoGPT / BabyAGI (March 2023) -- The Awakening of Autonomy

AutoGPT

Developer: Significant Gravitas (Toran Bruce Richards)

Technical Approach: GPT-4 + task decomposition + memory storage + web access + self-prompting loop.

BabyAGI

Developer: Yohei Nakajima

Technical Approach: GPT-4 + task creation/prioritization/execution loop + vector database memory.

Shared Breakthroughs:

  • First demonstrated the possibility of LLMs autonomously decomposing and executing complex tasks
  • Triggered worldwide attention and investment in autonomous agents
  • Although limited in practical reliability, they defined the direction for subsequent research

10. Voyager (2023) -- Lifelong Learning in Open Worlds

Developers: NVIDIA + Caltech + UT Austin

Technical Approach: In Minecraft, a GPT-4-driven agent achieves skill acquisition, skill library accumulation, and automatic curriculum design through code generation.

Breakthrough Significance:

  • First LLM agent to achieve lifelong learning in an open world
  • The Skill Library mechanism enables knowledge accumulation and reuse
  • Automatic Curriculum enables autonomous exploration from simple to complex

11. Claude Code (2024) -- A Reliable Programming Partner

Developer: Anthropic

Technical Approach: Claude model + file system operations + command-line execution + multi-turn interaction + Human-in-the-Loop.

Breakthrough Significance:

  • One of the first production-grade programming agents
  • Demonstrated that LLM agents can work reliably on real software engineering tasks
  • Human-in-the-Loop pattern balances autonomy with safety

12. Devin (2024) -- Autonomous Software Engineer

Developer: Cognition AI

Technical Approach: A full-stack autonomous programming agent integrating IDE, browser, and terminal.

Breakthrough Significance:

  • First commercial product claiming to be an "AI software engineer"
  • Achieved notable results on SWE-bench
  • Sparked widespread discussion about whether AI will replace programmers

13. OpenAI Operator (January 2025) -- Autonomy in the Browser

Developer: OpenAI

Technical Approach: A GPT-4-based browser agent capable of autonomously browsing web pages, filling forms, and completing shopping tasks.

Breakthrough Significance:

  • First commercial autonomous browser agent
  • Extended agents from "conversation" to "operating interfaces on behalf of humans"
  • Drove the development of the Computer Use paradigm

Comparative Analysis of Milestones

Milestone Knowledge Source Reasoning Method Action Space Learning Ability Generality
ELIZA Manual rules Pattern matching Text replies None Single domain
MYCIN Expert knowledge Forward/backward chaining Diagnostic suggestions None Single domain
Shakey Manual model STRIPS planning Physical movement None Restricted env.
SOAR Production rules Problem-space search Symbolic operations Chunking Multi-domain
Deep Blue Evaluation function Alpha-Beta search Chess moves None Single domain
AlphaGo Self-play MCTS + NN Go moves Deep RL Board games
GPT-3 Pre-training corpora Autoregressive generation Text In-context Multi-domain
AutoGPT Pre-training + tools CoT + reflection Text + tools Memory accumulation Multi-domain
Voyager Pre-training + code CoT + code generation Minecraft Skill library Game world
Claude Code Pre-training + tools Multi-step reasoning Code + files + CLI In-context learning Software eng.
Operator Pre-training + browser Vision + reasoning Web operations In-context learning Web tasks

Patterns of Development

1. Capability Leap Pattern

Every major breakthrough follows a similar pattern:

graph LR
    A[Theory Proposed] --> B[Restricted Prototype]
    B --> C[Domain Validation]
    C --> D[Engineering Optimization]
    D --> E[Large-Scale Deployment]
    E --> F[Inspires New Theory]
    F --> A

2. Key Turning Points

  1. Knowledge Acquisition Bottleneck (1980s): The excessive cost of knowledge engineering in expert systems drove the rise of machine learning
  2. Symbol Grounding Problem (1990s): Symbolic systems lacked perceptual and motor foundations, driving embodied agent research
  3. Data-Driven Revolution (2010s): Deep learning demonstrated the power of learning representations from data
  4. Language as Interface (2020s): LLMs turned natural language into a universal control interface for agents

3. Open Problems

  • Long-term planning: Existing agents still struggle with reliable planning beyond dozens of steps
  • World models: The gap between LLM "world knowledge" and true causal world models
  • Continual learning: How to continuously accumulate new capabilities without forgetting old knowledge
  • Safety alignment: How to ensure highly autonomous agents align with human intentions

References

  1. Weizenbaum, J. (1966). ELIZA. CACM, 9(1), 36-45.
  2. Shortliffe, E.H. (1976). Computer-Based Medical Consultations: MYCIN. Elsevier.
  3. Nilsson, N.J. (1984). Shakey the Robot. SRI International.
  4. Bratman, M.E. (1987). Intention, Plans, and Practical Reason. Harvard.
  5. Laird, J.E. (2012). The Soar Cognitive Architecture. MIT Press.
  6. Silver, D. et al. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489.
  7. Brown, T. et al. (2020). Language Models are Few-Shot Learners. NeurIPS 2020.
  8. Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291.

评论 #