Introduction to Human-Like Intelligence
The Core Question: Why Are Large Language Models Not Enough?
Over the past few years, Large Language Models (LLMs) have achieved remarkable success. From the GPT series to the Claude series, these models have demonstrated astonishing capabilities in text generation, code writing, logical reasoning, and other tasks. Yet a fundamental question remains unresolved:
Can language modeling alone lead us to true Artificial General Intelligence (AGI)?
An increasing number of leading researchers are answering this question in the negative.
The core paradigm of LLMs is next-token prediction: given a text sequence, predict the most likely next symbol. This paradigm grants models powerful pattern-matching abilities at the linguistic level, but it also introduces fundamental limitations:
| Capability | LLM Performance | Human Performance |
|---|---|---|
| Linguistic reasoning | Strong | Strong |
| Physical intuition | Extremely weak | Innate |
| Causal understanding | Statistical correlation | Genuine causal inference |
| World model | Implicit, unstable | Explicit, actionable |
| Learning from few experiences | Weak | Extremely strong |
| Continuous adaptation to the environment | Nearly absent | Lifelong learning |
A two-year-old child knows that a ball will fall to the ground, that water will spill from a cup, and that pushing an object will make it move. This knowledge is not learned from language — it is acquired through interaction with the physical world. LLMs have never "touched" anything; all of their knowledge comes from statistical patterns in textual symbols.
LeCun's Thesis: Language Models Are a Dead End for AGI
Yann LeCun is one of the three Turing Award laureates for deep learning and the Chief AI Scientist at Meta. Throughout 2025 and 2026, he has repeatedly articulated a pointed view:
LLMs are a dead end for AGI because they lack grounding in physical reality.
LeCun argues that language is an extremely compressed representational form of human knowledge. The volume of visual information a person receives through their eyes over a lifetime vastly exceeds the information contained in all the books ever written in human history. Language discards the overwhelming majority of details about the physical world. Therefore, learning solely from language can never yield a genuine understanding of physical reality.
LeCun has not merely offered criticism — he has also proposed an alternative: the Joint Embedding Predictive Architecture (JEPA). The central idea of JEPA is to perform prediction in latent space rather than at the pixel or token level. This enables models to learn more abstract and robust representations of the world.
In 2025, LeCun spearheaded the creation of AMI Labs (Advanced Machine Intelligence Labs), which secured an extraordinary $1.03 billion in funding. The sheer scale of this investment signals that "moving beyond language models toward human-like intelligence" is no longer a niche academic discussion — it has become a direction that industry takes seriously.
The Landscape of Human-Like Intelligence Research
Human-Like Intelligence research is not a single topic but rather a vast system composed of multiple intersecting fields. We can organize it as a logical chain, starting from the most foundational philosophical questions and progressing toward concrete technical approaches:
Ring 1: Philosophical Foundations — What Is the Nature of Intelligence?
Before we set out to build human-like intelligence, we must first address a prerequisite question: What is intelligence? What is understanding?
The core concepts at this level include:
- Mental Model: The internal representation humans hold of how the world works. We do not passively receive sensory information; instead, we actively use internal models to predict, explain, and plan.
- Consciousness and Subjective Experience: Does intelligent behavior necessarily equate to genuine understanding? Thought experiments such as the Chinese Room and philosophical zombies remind us that a gap may exist between behavior and experience.
- Emergence: Can complex systems produce entirely new, irreducible properties from the interactions of simple components? Is intelligence itself an emergent phenomenon?
The philosophical foundations delineate the boundaries and direction of the entire research program: if we do not know what we are trying to build, the building process is blind.
Ring 2: Neuroscience Insights — How Does the Human Brain Do It?
The brain is the only known system that has achieved general intelligence. Studying how the brain works provides the most direct source of inspiration for human-like intelligence:
- Innate Priors: The human brain is not a blank slate. Newborns come equipped with prior knowledge of object permanence, simple physical laws, face recognition, and more. These innate structures are the product of millions of years of evolution.
- Predictive Coding: The brain's core mode of operation is not passive perception but the continuous generation of predictions about the next moment's sensory input, followed by updating the internal model based on the error between the actual input and the prediction.
- Free Energy Principle and Active Inference: A theoretical framework proposed by Karl Friston, which holds that all brain activity can be understood as minimizing "free energy" (an upper bound on prediction error). Active inference goes further, proposing that organisms not only update their beliefs but also act on the environment to make it conform to their expectations.
- Neuromorphic Computing: Hardware designs that mimic the neural structure of the brain, pursuing a computational paradigm characterized by low power consumption, event-driven processing, and massive parallelism.
Ring 3: World Models — The Core Path Toward Machine Understanding of the World
If LLMs learn a model of language, then human-like intelligence needs to learn a model of the world.
- World Model: A system capable of internally simulating the dynamics of the external world. Given a current state and an action, a world model can predict what the next state will be. This is precisely the core capacity of human thought: mentally "simulating" various possible scenarios.
- JEPA (Joint Embedding Predictive Architecture): The architecture proposed by LeCun, whose central idea is to predict in latent space rather than in raw data space. This avoids the computational waste of pixel-level prediction and allows the model to focus on learning the abstract structure of the world.
- Spatial Intelligence and Learned Simulation: A direction led by Fei-Fei Li, who founded World Labs to build AI systems that understand and generate three-dimensional space. Spatial intelligence emphasizes that vision and 3D understanding are critical pathways toward world models.
Ring 4: Causality and Representation — Key Capabilities Beyond Correlation
Statistical learning can discover correlations in data, but correlation is not causation. Human-like intelligence requires deeper representational capabilities:
- Causal Learning: A direction championed by Yoshua Bengio and others, aimed at enabling AI systems to learn causal relationships between variables rather than merely statistical correlations. Causal models allow systems to answer counterfactual questions ("What would have happened if I had done X instead of Y?") and perform interventional reasoning.
- Object-Centric Learning: Humans do not understand the world at the pixel level — they understand it at the level of "objects." We decompose scenes into individual entities, understanding their respective properties and mutual relationships. Object-centric learning seeks to endow AI with this same structured representational capability.
- Neuro-Symbolic AI: Combining the learning capacity of neural networks with the reasoning power of symbolic systems. Neural networks excel at extracting patterns from data; symbolic systems excel at logical reasoning and compositional generalization.
Ring 5: Embodiment and Intuitive Physics — Grounding in the Physical World
Understanding the world cannot rely on observation alone — it also requires acting within the world.
- Embodied Intelligence: Intelligence is not abstract computation divorced from the body; it is a process deeply coupled with the body and the environment. An embodied agent can learn through physical interaction with its environment, acquiring knowledge that cannot be obtained purely from data.
- Intuitive Physics: Research by Josh Tenenbaum and others has shown that humans possess an internal "physics engine" capable of mentally simulating the motion, collision, and support relationships of objects. This intuitive physics is a capability that begins developing in infants just a few months old.
Ring 6: Meta-Learning and Self-Improvement — Learning to Learn
True intelligence is not merely about learning specific knowledge — it is about learning how to learn.
- Meta-Learning: Also known as "learning to learn," its goal is to enable systems to rapidly adapt when faced with new tasks, rather than training from scratch.
- Recursive Self-Improvement: A system capable of improving its own learning algorithm could, in theory, enter a cycle of accelerating progress. This is both one of the most anticipated properties of AGI and one of the risk sources that demands the greatest caution.
Key Figures
| Figure | Core Contribution | Representative Project / Theory |
|---|---|---|
| Yann LeCun | World models, moving beyond LLMs | JEPA architecture, AMI Labs ($1.03 billion) |
| Karl Friston | A unifying theory of brain function | Free Energy Principle, Active Inference |
| Josh Tenenbaum | Computational modeling of human cognition | Intuitive physics, probabilistic programs |
| Yoshua Bengio | Causal representation learning | Causal learning, System 2 deep learning |
| Fei-Fei Li | Spatial intelligence and 3D understanding | World Labs, spatial intelligence |
The Current Landscape
Between 2025 and 2026, the AI field is undergoing a profound paradigm shift. The dominant theme of recent years has been "scaling up language models," but now an increasing share of funding and talent is flowing toward a new direction:
Building world models grounded in physical reality.
LeCun's AMI Labs secured $1.03 billion in funding; Fei-Fei Li's World Labs attracted substantial investment; Friston's active inference framework is being adopted by a growing number of robotics research teams. Together, these signals point to a shared conclusion: scaling language models alone is insufficient to achieve AGI.
Of course, this does not mean that LLMs are without value. Language models remain extremely powerful in text comprehension, code generation, knowledge retrieval, and other domains. But if the goal is to build truly world-understanding, physically adaptive, commonsense-reasoning general intelligence, then we need to take a different path.
The starting point of that path is a deep understanding of human intelligence itself. From philosophical inquiry to neuroscience insights, from world models to causal reasoning, from embodied interaction to meta-learning — each link is an indispensable part of this journey. This note series will explore each of these topics in turn.