Deep Learning Frontier Trends
Overview
Deep learning is evolving toward greater efficiency, broader generality, and closer approximation of human cognition. This chapter explores test-time compute, neural architecture search, KAN, Liquid Neural Networks, neuromorphic computing, and other cutting-edge directions.
1. Test-Time Compute
1.1 Core Idea
Traditional paradigm: More training compute → better performance.
New paradigm: More compute at inference time → better performance.
1.2 Main Approaches
Chain-of-Thought (CoT):
- Let the model reason step by step
- Explicitly generate intermediate steps
- Simple prompting improves math/logic abilities
Best-of-N Sampling:
- Generate N candidate answers
- Use a verifier to select the best
- Simple but effective
Search-Based Reasoning (Tree/Graph of Thought):
- Build reasoning trees
- Search across multiple reasoning paths
- Backtracking and exploration
1.3 DeepSeek-R1 and Reasoning Models
- o1/o3: OpenAI's reasoning models, trained with RL for long-chain reasoning
- DeepSeek-R1: Open-source reasoning model, trained with GRPO
- Core Idea: Teach models to "think longer" at inference time
1.4 Inference Compute Scaling Law
More inference tokens → better performance (especially pronounced on math, coding, and reasoning tasks)
2. Neural Architecture Search (NAS)
2.1 Basic Concept
Automatically search for optimal neural network architectures, replacing manual design.
Search Space:
- Layer types (convolution, attention, MLP, etc.)
- Connection patterns
- Hyperparameters (channels, kernel size, etc.)
2.2 Main Methods
| Method | Description | Representative |
|---|---|---|
| Reinforcement Learning | Use RL to search architectures | NASNet |
| Evolutionary | Mutation + selection | AmoebaNet |
| Differentiable | Relax discrete choices to continuous optimization | DARTS |
| Supernet | Train one network containing all sub-architectures | Once-for-All |
2.3 DARTS
where \(\alpha\) are architecture parameters, jointly learning weights and architecture through bilevel optimization.
2.4 Current Status
- NAS in the LLM era mainly used for searching efficient architectures (e.g., mobile models)
- Hardware-aware NAS: Considers hardware constraints like latency and power consumption
- Searching optimal configurations within Transformer architecture (layers, heads, FFN size, etc.)
3. KAN (Kolmogorov-Arnold Networks)
3.1 Theoretical Foundation
Kolmogorov-Arnold Representation Theorem:
Any multivariate continuous function \(f(x_1, \ldots, x_n)\) can be represented as:
where \(\phi_{q,p}\) and \(\Phi_q\) are univariate functions.
3.2 KAN vs MLP
| Feature | MLP | KAN |
|---|---|---|
| Learnable component | Weights (scalars on edges) | Activation functions (functions on edges) |
| Fixed component | Activation functions | Summation nodes |
| Approximation theorem | Universal Approximation | Kolmogorov-Arnold |
| Interpretability | Low | Higher |
| Parameter efficiency | Average | Potentially better (certain tasks) |
3.3 KAN Implementation
Edge activation functions are parameterized with B-splines:
Each edge learns a spline function rather than a scalar weight.
3.4 Current Status
- Strengths: Potential in scientific computing and interpretability
- Limitations: No demonstrated advantage on large-scale tasks (LLMs) yet
- Active Research: Integration with Transformers, Graph KAN, time series KAN
4. Liquid Neural Networks
4.1 Core Idea
Continuous-time neural networks inspired by biological nervous systems (C. elegans).
Liquid Time-Constant Networks (LTC):
where \(\tau\) is the time constant and \(f\) is a learned nonlinear function.
4.2 Characteristics
- Causality: Naturally handles causal reasoning
- Continuous time: Adapts to irregular sampling
- Compact: Requires very few neurons (19 neurons can learn to drive)
- Interpretable: Simple structure, analyzable
4.3 Applications
- Autonomous driving
- Time series forecasting
- Robot control
- Weather prediction
4.4 CfC (Closed-form Continuous-time)
Closed-form solution, avoiding ODE solver computational overhead.
5. Neuromorphic Computing
5.1 Spiking Neural Networks (SNN)
Simulate biological neurons' spike-firing mechanism:
LIF (Leaky Integrate-and-Fire) Model:
When \(V > V_{\text{thresh}}\), fire a spike and reset.
5.2 vs Artificial Neural Networks
| Feature | ANN | SNN |
|---|---|---|
| Information encoding | Continuous values | Spike trains |
| Computation | Matrix multiplication | Event-driven |
| Energy consumption | High | Extremely low |
| Hardware | GPU | Neuromorphic chips |
| Training | Backpropagation | Surrogate gradient/STDP |
| Temporal modeling | Requires design | Naturally temporal |
5.3 Neuromorphic Chips
| Chip | Developer | Features |
|---|---|---|
| Loihi 2 | Intel | 128 cores, 1M neurons |
| TrueNorth | IBM | 1M neurons, 70mW power |
| SpiNNaker 2 | Univ. of Manchester | Large-scale simulation |
| Akida | BrainChip | Edge AI |
5.4 Challenges and Prospects
- Training difficulty: Spikes are non-differentiable, requiring surrogate gradients
- Accuracy gap: Underperforms ANNs on most tasks
- Use cases: Ultra-low-power edge devices, event cameras
- Future directions: ANN-SNN hybrids, large-scale SNNs
6. Other Frontier Directions
6.1 World Models
- Learn internal models of environments for prediction and planning
- JEPA architecture (proposed by Yann LeCun)
- Genie 2 (DeepMind): Interactive 3D world generation
6.2 Retrieval-Augmented Generation (RAG)
- Combine external knowledge bases with LLMs
- Reduce hallucinations, update knowledge
- Vector retrieval + reranking + generation
6.3 AI Agents
- Tool use (function calling)
- Multi-step reasoning and planning
- Code execution environments
- Multi-agent collaboration
6.4 Embodied Intelligence
- Vision-Language-Action models (VLA)
- RT-2, Octo, and other robot foundation models
- Sim-to-real transfer
7. Summary
graph TD
A[DL Frontiers] --> B[Inference Scaling]
A --> C[Architecture Innovation]
A --> D[Compute Paradigms]
A --> E[Application Directions]
B --> B1[Test-Time Compute]
B --> B2[Reasoning Models o1/R1]
C --> C1[KAN]
C --> C2[Liquid Networks]
C --> C3[NAS]
D --> D1[Neuromorphic Computing]
D --> D2[Analog Computing]
E --> E1[World Models]
E --> E2[AI Agents]
E --> E3[Embodied Intelligence]
Future Trend Predictions:
- Inference compute will become as important as training compute
- Hybrid architectures will continue to develop (Transformer + SSM + MoE)
- Neuromorphic computing has opportunities in edge scenarios
- AI Agents will become the primary application paradigm
- Embodied intelligence is a long-term important direction
References
- Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022
- Liu et al., "KAN: Kolmogorov-Arnold Networks," 2024
- Hasani et al., "Liquid Time-constant Networks," AAAI 2021
- Roy et al., "Towards Spike-Based Machine Intelligence with Neuromorphic Computing," Nature 2019
- Liu et al., "DARTS: Differentiable Architecture Search," ICLR 2019