Key Conferences and Papers
Overview
Agent research spans multiple disciplines, with relevant work distributed across top venues in AI, NLP, robotics, and software engineering. This article surveys the most important academic conferences and foundational papers in the agent field, helping researchers quickly locate core literature.
1. Core Academic Conferences
1.1 Agent-Specific Conferences
| Conference |
Full Name |
Founded |
Characteristics |
| AAMAS |
International Conference on Autonomous Agents and Multiagent Systems |
2002 |
The most authoritative dedicated conference for agents |
| AAAI |
Association for the Advancement of Artificial Intelligence |
1980 |
Comprehensive AI conference with extensive agent work |
| IJCAI |
International Joint Conference on Artificial Intelligence |
1969 |
The earliest international AI conference |
1.2 Deep Learning and NLP Conferences
LLM agent research is primarily published at the following venues:
| Conference |
Relevance to Agents |
Representative Work |
| NeurIPS |
Agent workshops, reasoning methods |
CoT, ToT, Reflexion |
| ICML |
RL-based agents, tool learning |
Toolformer, RLHF |
| ICLR |
LLM reasoning, agent architectures |
ReAct, Self-Refine |
| ACL/EMNLP |
Language agents, dialogue systems |
WebGPT, Generative Agents |
| COLM |
Conference on Language Modeling (new in 2024) |
LLM agent evaluation and design |
1.3 Robotics and Embodied Intelligence Conferences
| Conference |
Relevance to Agents |
| ICRA |
Robotic agents, embodied planning |
| IROS |
Autonomous systems, multi-robot coordination |
| CoRL |
Robot learning, embodied decision-making |
| RSS |
Robotics: Science and Systems |
1.4 Important Workshops
| Workshop |
Host Conference |
Topic |
| LLM Agents Workshop |
NeurIPS 2023/2024 |
Design and evaluation of LLM agents |
| Foundation Models for Decision Making |
NeurIPS 2023 |
Foundation models for decision-making |
| Agent Learning in Open-Endedness |
ICML 2024 |
Agent learning in open-ended worlds |
| Language Agents Workshop |
ICLR 2024 |
Language-driven agents |
2. Foundational Papers
2.1 Blog Posts and Surveys (Informal but Highly Influential)
| Year |
Author |
Title |
Contribution |
| 2023.06 |
Lilian Weng |
LLM Powered Autonomous Agents |
Defined the classic LLM agent framework: Planning + Memory + Tool Use |
| 2023.09 |
Andrew Ng |
Agentic Design Patterns |
Systematically summarized four agent design patterns: Reflection, Tool Use, Planning, Multi-Agent |
| 2024.01 |
Anthropic |
Building Effective Agents |
Proposed engineering best practices for agent systems |
Recommended Starting Point
Lilian Weng's blog post is the most widely cited informal reference in the LLM agent field and is recommended as a first read.
2.2 Reasoning and Chain-of-Thought
| Year |
Paper |
Venue |
Core Contribution |
| 2022 |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
NeurIPS 2022 |
Wei et al. proposed CoT, demonstrating that intermediate reasoning steps significantly improve LLM reasoning |
| 2022 |
Self-Consistency Improves Chain of Thought Reasoning |
ICLR 2023 |
Wang et al. proposed self-consistency sampling with majority voting across multiple reasoning paths |
| 2023 |
Tree of Thoughts: Deliberate Problem Solving with LLMs |
NeurIPS 2023 |
Yao et al. extended reasoning from chains to trees, supporting backtracking and search |
| Year |
Paper |
Venue |
Core Contribution |
| 2022 |
ReAct: Synergizing Reasoning and Acting in Language Models |
ICLR 2023 |
Yao et al. proposed the Thought-Action-Observation loop, unifying reasoning and action |
| 2021 |
WebGPT: Browser-assisted Question-answering |
arXiv |
Nakano et al. LLM uses browser to search and cite information |
| 2023 |
Toolformer: Language Models Can Teach Themselves to Use Tools |
NeurIPS 2023 |
Schick et al. LLM autonomously learns when and how to call tools |
| 2023 |
Gorilla: Large Language Model Connected with Massive APIs |
arXiv |
Patil et al. trained LLM to accurately call large-scale APIs |
2.4 Reflection and Self-Improvement
| Year |
Paper |
Venue |
Core Contribution |
| 2023 |
Reflexion: Language Agents with Verbal Reinforcement Learning |
NeurIPS 2023 |
Shinn et al. verbalized experience reflection replaces gradient updates |
| 2023 |
Self-Refine: Iterative Refinement with Self-Feedback |
NeurIPS 2023 |
Madaan et al. iterative generate-feedback-refine optimization loop |
| 2024 |
Self-Debugging: Teaching LLMs to Debug Their Predictions |
arXiv |
Chen et al. LLM self-debugs code through execution feedback |
2.5 Agent Systems and Architectures
| Year |
Paper |
Venue |
Core Contribution |
| 2023 |
Generative Agents: Interactive Simulacra of Human Behavior |
UIST 2023 |
Park et al. social simulation of 25 generative agents in a virtual town |
| 2023 |
Voyager: An Open-Ended Embodied Agent with LLMs |
arXiv |
Wang et al. lifelong learning agent in Minecraft |
| 2023 |
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework |
ICLR 2024 |
Hong et al. standardized multi-agent software development workflow |
| 2024 |
Cognitive Architectures for Language Agents (CoALA) |
arXiv |
Sumers et al. cognitive architecture framework for language agents |
2.6 Evaluation and Benchmarks
| Year |
Paper |
Venue |
Core Contribution |
| 2023 |
AgentBench: Evaluating LLMs as Agents |
ICLR 2024 |
First comprehensive LLM agent evaluation benchmark |
| 2023 |
SWE-bench: Can Language Models Resolve Real-World Issues? |
ICLR 2024 |
Software engineering evaluation based on real GitHub issues |
| 2023 |
WebArena: A Realistic Web Environment for Building Autonomous Agents |
ICLR 2024 |
Realistic web environment for agent evaluation |
3. Classic Textbooks
| Book |
Author |
Year |
Status |
| Artificial Intelligence: A Modern Approach |
Russell & Norvig |
1995/2020 |
The "AI Bible," with an agent perspective throughout |
| An Introduction to MultiAgent Systems |
Wooldridge |
2002/2009 |
Classic textbook on multi-agent systems |
| Multiagent Systems |
Shoham & Leyton-Brown |
2008 |
Multi-agent algorithms and game theory |
| Speech and Language Processing |
Jurafsky & Martin |
2000/2024 |
NLP reference book with dialogue system chapters |
4. Paper Reading Roadmap
Beginner Level (Recommended in Order)
- Weng (2023) -- LLM Powered Autonomous Agents (blog)
- Wei et al. (2022) -- Chain-of-Thought
- Yao et al. (2022) -- ReAct
- Park et al. (2023) -- Generative Agents
- Shinn et al. (2023) -- Reflexion
- Yao et al. (2023) -- Tree of Thoughts
- Sumers et al. (2024) -- CoALA
- Schick et al. (2023) -- Toolformer
- Wang et al. (2023) -- Voyager
- Hong et al. (2023) -- MetaGPT
Advanced Level
- OpenAI (2024) -- o1 System Card
- DeepSeek (2025) -- DeepSeek-R1
- Anthropic (2024) -- Building Effective Agents
- AgentBench / SWE-bench evaluation papers
5. Key Research Teams
| Team/Institution |
Key Researchers |
Research Focus |
| Princeton NLP |
Karthik Narasimhan, Shunyu Yao |
ReAct, ToT, SWE-bench |
| Stanford NLP |
Percy Liang, Joon Sung Park |
Generative Agents, HELM |
| CMU |
Graham Neubig |
Code agents, software engineering |
| OpenAI |
Research team |
GPT series, Function Calling, Operator |
| Anthropic |
Research team |
Claude, Constitutional AI |
| DeepMind |
Research team |
Gemini, AlphaCode |
| Microsoft Research |
Research team |
AutoGen, TaskWeaver |
| Tsinghua KEG |
Jie Tang's team |
AgentBench, ChatGLM |
References
- Weng, L. (2023). LLM Powered Autonomous Agents. lilianweng.github.io.
- Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
- Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
- Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST 2023.
- Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023.