Automated Reasoning
Automated reasoning is the field within artificial intelligence that studies how to make computers automatically perform logical deduction. It encompasses a broad range of techniques from Boolean satisfiability (SAT) solving to formal theorem proving and serves as a foundational tool for software verification, hardware design, mathematical proof, and AI safety.
In recent years, the convergence of deep learning and formal methods (e.g., AlphaProof, neural theorem proving) has ushered in a profound transformation in automated reasoning. These notes cover SAT/SMT solving, theorem proving, formal verification, and cutting-edge directions at the intersection of formal methods and deep learning.
1. Classification of Reasoning Systems
| Dimension | Type | Description |
|---|---|---|
| By logic level | Propositional logic reasoning | Variables take only true/false values |
| First-order logic reasoning | Supports quantifiers (\(\forall, \exists\)) and predicates | |
| Higher-order logic reasoning | Variables can be functions or predicates themselves | |
| By automation level | Fully automated | SAT/SMT solvers |
| Semi-automated (interactive) | Lean, Coq, Isabelle | |
| By reasoning direction | Forward reasoning | Derive from axioms and known theorems |
| Backward reasoning | Decompose the goal into subgoals |
2. The SAT Problem and Solvers
2.1 Boolean Satisfiability
The SAT problem: Given a Boolean formula \(\varphi\) (typically in CNF), determine whether there exists a variable assignment that makes \(\varphi\) true.
- SAT was the first problem proven to be NP-complete (Cook-Levin theorem, 1971)
- Despite exponential worst-case complexity, modern SAT solvers perform remarkably well on practical instances
2.2 The DPLL Algorithm
DPLL (Davis-Putnam-Logemann-Loveland, 1962) is the earliest systematic SAT solving algorithm:
DPLL(φ):
1. Unit Propagation:
If a clause contains only one literal, that literal must be set to true
2. Pure Literal Elimination:
If a variable appears only in positive or only in negative form, assign accordingly
3. Branching:
Choose a variable, try assigning true/false, and solve recursively
4. Backtracking:
If the current assignment leads to a conflict, backtrack to the previous decision point
2.3 The CDCL Algorithm
CDCL (Conflict-Driven Clause Learning) is the core algorithm of modern SAT solvers, extending DPLL with:
| Enhancement | Description |
|---|---|
| Conflict analysis | When a conflict occurs, analyze its cause and learn a conflict clause |
| Non-chronological backtracking | Backtrack directly to the root cause of the conflict (not just the previous decision point) |
| Clause learning | Add conflict clauses to the formula to avoid redundant search |
| Variable selection heuristics | Strategies such as VSIDS (Variable State Independent Decaying Sum) |
| Restart policies | Periodically restart the search while retaining learned clauses |
Representative solvers: MiniSat, CaDiCaL, Kissat, Z3 (SAT component)
SAT Competition
The annual SAT Competition has driven rapid advances in solver technology. Modern solvers can handle instances with millions of variables.
2.4 WalkSAT
WalkSAT is a randomized local search algorithm:
- Randomly generate an initial assignment
- Select an unsatisfied clause
- With probability \(p\), randomly flip a variable in that clause; with probability \(1-p\), flip the variable that minimizes the number of unsatisfied clauses
- Repeat until a satisfying assignment is found or a limit is reached
Effective for finding satisfying assignments but cannot prove unsatisfiability.
3. SMT Solvers
3.1 Satisfiability Modulo Theories
SMT (Satisfiability Modulo Theories) generalizes SAT to richer logical theories:
| Background Theory | Abbreviation | Example |
|---|---|---|
| Linear Integer/Real Arithmetic | LIA/LRA | \(x + 2y \leq 5\) |
| Bitvectors | BV | 32-bit integer overflow detection |
| Arrays | AR | \(\text{read}(\text{write}(a, i, v), i) = v\) |
| Uninterpreted Functions | UF | \(f(x) = f(y) \Rightarrow x = y\) (if \(f\) is injective) |
| Strings | S | Regular expression matching |
3.2 The DPLL(T) Framework
Modern SMT solvers adopt the DPLL(T) framework: a SAT solver handles the Boolean skeleton while a theory solver checks theory consistency.
SAT Solver ←→ Theory Solver(T)
Boolean assignment → T-consistency check
Conflict clause ← T-conflict explanation
3.3 Introduction to Z3
Z3 (Microsoft Research) is the most widely used SMT solver:
from z3 import *
x, y = Ints('x y')
s = Solver()
s.add(x + y == 10)
s.add(x > 0, y > 0)
s.add(x > y)
if s.check() == sat:
print(s.model()) # [x = 6, y = 4]
Application domains: program verification, symbolic execution, constraint solving, test generation.
4. Theorem Proving
4.1 Automated Theorem Proving
Resolution is the foundation of automated theorem proving:
- Negate the proposition to be proved and convert to CNF
- Repeatedly apply the resolution rule: from \((A \lor C)\) and \((\neg A \lor D)\), derive \((C \lor D)\)
- If the empty clause \(\Box\) is derived, the original proposition holds (proof by contradiction)
| Automated Theorem Prover | Characteristics |
|---|---|
| E Prover | First-order logic, equational reasoning |
| Vampire | First-order logic, competition winner |
| SPASS | First-order logic, ordering-based strategy |
4.2 Interactive Theorem Proving
Interactive theorem provers (ITPs) require humans to provide proof strategies while the computer verifies the correctness of each step.
| Proof Assistant | Logical Foundation | Language | Notable Projects |
|---|---|---|---|
| Lean | Dependent type theory (CIC variant) | Lean 4 | Mathlib (1M+ theorems) |
| Coq | Calculus of Inductive Constructions (CIC) | Gallina | CompCert (verified C compiler) |
| Isabelle | Higher-order logic (HOL) | Isar | seL4 (verified microkernel) |
| Agda | Dependent type theory | Agda | Type theory research |
Lean 4 example (proving \(n + 0 = n\)):
theorem add_zero (n : Nat) : n + 0 = n := by
induction n with
| zero => rfl
| succ n ih => simp [Nat.add_succ, ih]
The Mathlib Project
Mathlib is the largest formalized mathematics library for Lean 4, covering algebra, analysis, topology, number theory, and more, with over 1 million formalized theorems and definitions.
5. Formal Verification
5.1 Model Checking
Model checking automatically verifies whether a finite-state system satisfies a given temporal logic property.
Input: System model \(M\) (finite state machine) + property \(\phi\) (e.g., a CTL/LTL formula)
Output: \(M \models \phi\) (satisfies) or a counterexample trace
| Tool | Application Domain |
|---|---|
| SPIN | Concurrent protocol verification |
| NuSMV | Hardware design verification |
| CBMC | Bounded model checking for C/C++ programs |
| TLA+ | Distributed system design (used extensively at Amazon) |
5.2 Program Verification
| Method | Description | Tools |
|---|---|---|
| Hoare logic | Derive program correctness using preconditions/postconditions | Dafny, Frama-C |
| Symbolic execution | Execute programs with symbolic values instead of concrete ones | KLEE, angr |
| Abstract interpretation | Analyze program behavior over abstract domains | Astree |
| Bounded model checking | Unroll loops to a finite bound and reduce to SAT/SMT | CBMC |
6. Formal Methods in AI
6.1 Neural Network Verification
Verify the robustness of neural networks: for all perturbations \(x'\) within an \(\epsilon\)-neighborhood of input \(x\), does the network output remain consistent?
| Verification Tool | Approach |
|---|---|
| \(\alpha,\beta\)-CROWN | Branch and bound + linear relaxation |
| Marabou | SMT-based verification |
| ERAN | Abstract interpretation |
| VNN-COMP | Annual neural network verification competition |
6.2 LLM Output Verification
- Code verification: Use formal methods to verify that LLM-generated code satisfies specifications
- Reasoning chain verification: Verify the logical correctness of each step in Chain-of-Thought reasoning
- Fact checking: Convert factual claims in LLM outputs into verifiable logical propositions
7. Convergence with Deep Learning
7.1 AlphaProof
AlphaProof (DeepMind, 2024) applies AlphaZero-style reinforcement learning to mathematical theorem proving:
- Proves IMO (International Mathematical Olympiad) competition problems in the Lean 4 formal environment
- Models proof search as a game-like search problem
- Uses LLMs to generate proof candidates, verified by the formal system
7.2 Neural Theorem Proving
| System | Method | Target |
|---|---|---|
| GPT-f | Transformer generates proof steps | Lean/Metamath proofs |
| HTPS | Hyper-tree search + neural networks | Lean proof search |
| LeanDojo | LLM + Lean interactive environment | Open research platform |
| Llemma | Mathematics-specialized LLM | Mathematical reasoning and proof |
| DeepSeek-Prover | Reinforcement learning + formal verification | Lean theorem proving |
Core approach:
- A language model generates proof tactics
- A formal system verifies the correctness of each step
- A search algorithm (e.g., MCTS) explores the proof space
- Reinforcement learning learns from successful proofs
7.3 Prospects and Challenges
| Dimension | Current State | Challenges |
|---|---|---|
| Proving capability | Can solve undergraduate-level math problems | Still far from frontier mathematical research |
| Formalization coverage | Mathlib covers substantial foundational mathematics | Many advanced areas remain unformalized |
| Search efficiency | LLM guidance reduces the search space | Search spaces for complex proofs remain enormous |
| Generalization | Performs well within the training distribution | Limited generalization to entirely new problem types |
Formalization Does Not Equal Understanding
Current neural theorem proving systems can find correct proofs, but this does not mean the system "understands" mathematical concepts. The correctness of proofs is guaranteed by the formal system, not by the neural network.
References
- Biere et al., Handbook of Satisfiability
- Harrison, Handbook of Practical Logic and Automated Reasoning
- Baier & Katoen, Principles of Model Checking
- Trinh et al., "Solving olympiad geometry without human demonstrations" (Nature, 2024)
- Lean 4 Documentation
- Z3 Theorem Prover