Skip to content

Automated Reasoning

Automated reasoning is the field within artificial intelligence that studies how to make computers automatically perform logical deduction. It encompasses a broad range of techniques from Boolean satisfiability (SAT) solving to formal theorem proving and serves as a foundational tool for software verification, hardware design, mathematical proof, and AI safety.

In recent years, the convergence of deep learning and formal methods (e.g., AlphaProof, neural theorem proving) has ushered in a profound transformation in automated reasoning. These notes cover SAT/SMT solving, theorem proving, formal verification, and cutting-edge directions at the intersection of formal methods and deep learning.


1. Classification of Reasoning Systems

Dimension Type Description
By logic level Propositional logic reasoning Variables take only true/false values
First-order logic reasoning Supports quantifiers (\(\forall, \exists\)) and predicates
Higher-order logic reasoning Variables can be functions or predicates themselves
By automation level Fully automated SAT/SMT solvers
Semi-automated (interactive) Lean, Coq, Isabelle
By reasoning direction Forward reasoning Derive from axioms and known theorems
Backward reasoning Decompose the goal into subgoals

2. The SAT Problem and Solvers

2.1 Boolean Satisfiability

The SAT problem: Given a Boolean formula \(\varphi\) (typically in CNF), determine whether there exists a variable assignment that makes \(\varphi\) true.

\[ \varphi = (x_1 \lor \neg x_2) \land (\neg x_1 \lor x_3) \land (x_2 \lor \neg x_3) \]
  • SAT was the first problem proven to be NP-complete (Cook-Levin theorem, 1971)
  • Despite exponential worst-case complexity, modern SAT solvers perform remarkably well on practical instances

2.2 The DPLL Algorithm

DPLL (Davis-Putnam-Logemann-Loveland, 1962) is the earliest systematic SAT solving algorithm:

DPLL(φ):
    1. Unit Propagation:
       If a clause contains only one literal, that literal must be set to true
    2. Pure Literal Elimination:
       If a variable appears only in positive or only in negative form, assign accordingly
    3. Branching:
       Choose a variable, try assigning true/false, and solve recursively
    4. Backtracking:
       If the current assignment leads to a conflict, backtrack to the previous decision point

2.3 The CDCL Algorithm

CDCL (Conflict-Driven Clause Learning) is the core algorithm of modern SAT solvers, extending DPLL with:

Enhancement Description
Conflict analysis When a conflict occurs, analyze its cause and learn a conflict clause
Non-chronological backtracking Backtrack directly to the root cause of the conflict (not just the previous decision point)
Clause learning Add conflict clauses to the formula to avoid redundant search
Variable selection heuristics Strategies such as VSIDS (Variable State Independent Decaying Sum)
Restart policies Periodically restart the search while retaining learned clauses

Representative solvers: MiniSat, CaDiCaL, Kissat, Z3 (SAT component)

SAT Competition

The annual SAT Competition has driven rapid advances in solver technology. Modern solvers can handle instances with millions of variables.

2.4 WalkSAT

WalkSAT is a randomized local search algorithm:

  1. Randomly generate an initial assignment
  2. Select an unsatisfied clause
  3. With probability \(p\), randomly flip a variable in that clause; with probability \(1-p\), flip the variable that minimizes the number of unsatisfied clauses
  4. Repeat until a satisfying assignment is found or a limit is reached

Effective for finding satisfying assignments but cannot prove unsatisfiability.


3. SMT Solvers

3.1 Satisfiability Modulo Theories

SMT (Satisfiability Modulo Theories) generalizes SAT to richer logical theories:

\[ (x + y > 3) \land (x < 2) \land (y > 0) \land (z = x \cdot y) \]
Background Theory Abbreviation Example
Linear Integer/Real Arithmetic LIA/LRA \(x + 2y \leq 5\)
Bitvectors BV 32-bit integer overflow detection
Arrays AR \(\text{read}(\text{write}(a, i, v), i) = v\)
Uninterpreted Functions UF \(f(x) = f(y) \Rightarrow x = y\) (if \(f\) is injective)
Strings S Regular expression matching

3.2 The DPLL(T) Framework

Modern SMT solvers adopt the DPLL(T) framework: a SAT solver handles the Boolean skeleton while a theory solver checks theory consistency.

SAT Solver ←→ Theory Solver(T)
  Boolean assignment → T-consistency check
  Conflict clause    ← T-conflict explanation

3.3 Introduction to Z3

Z3 (Microsoft Research) is the most widely used SMT solver:

from z3 import *

x, y = Ints('x y')
s = Solver()
s.add(x + y == 10)
s.add(x > 0, y > 0)
s.add(x > y)

if s.check() == sat:
    print(s.model())  # [x = 6, y = 4]

Application domains: program verification, symbolic execution, constraint solving, test generation.


4. Theorem Proving

4.1 Automated Theorem Proving

Resolution is the foundation of automated theorem proving:

  1. Negate the proposition to be proved and convert to CNF
  2. Repeatedly apply the resolution rule: from \((A \lor C)\) and \((\neg A \lor D)\), derive \((C \lor D)\)
  3. If the empty clause \(\Box\) is derived, the original proposition holds (proof by contradiction)
Automated Theorem Prover Characteristics
E Prover First-order logic, equational reasoning
Vampire First-order logic, competition winner
SPASS First-order logic, ordering-based strategy

4.2 Interactive Theorem Proving

Interactive theorem provers (ITPs) require humans to provide proof strategies while the computer verifies the correctness of each step.

Proof Assistant Logical Foundation Language Notable Projects
Lean Dependent type theory (CIC variant) Lean 4 Mathlib (1M+ theorems)
Coq Calculus of Inductive Constructions (CIC) Gallina CompCert (verified C compiler)
Isabelle Higher-order logic (HOL) Isar seL4 (verified microkernel)
Agda Dependent type theory Agda Type theory research

Lean 4 example (proving \(n + 0 = n\)):

theorem add_zero (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n ih => simp [Nat.add_succ, ih]

The Mathlib Project

Mathlib is the largest formalized mathematics library for Lean 4, covering algebra, analysis, topology, number theory, and more, with over 1 million formalized theorems and definitions.


5. Formal Verification

5.1 Model Checking

Model checking automatically verifies whether a finite-state system satisfies a given temporal logic property.

Input: System model \(M\) (finite state machine) + property \(\phi\) (e.g., a CTL/LTL formula)

Output: \(M \models \phi\) (satisfies) or a counterexample trace

Tool Application Domain
SPIN Concurrent protocol verification
NuSMV Hardware design verification
CBMC Bounded model checking for C/C++ programs
TLA+ Distributed system design (used extensively at Amazon)

5.2 Program Verification

Method Description Tools
Hoare logic Derive program correctness using preconditions/postconditions Dafny, Frama-C
Symbolic execution Execute programs with symbolic values instead of concrete ones KLEE, angr
Abstract interpretation Analyze program behavior over abstract domains Astree
Bounded model checking Unroll loops to a finite bound and reduce to SAT/SMT CBMC

6. Formal Methods in AI

6.1 Neural Network Verification

Verify the robustness of neural networks: for all perturbations \(x'\) within an \(\epsilon\)-neighborhood of input \(x\), does the network output remain consistent?

\[ \forall x' \in B_\epsilon(x): \; \arg\max f_\theta(x') = \arg\max f_\theta(x) \]
Verification Tool Approach
\(\alpha,\beta\)-CROWN Branch and bound + linear relaxation
Marabou SMT-based verification
ERAN Abstract interpretation
VNN-COMP Annual neural network verification competition

6.2 LLM Output Verification

  • Code verification: Use formal methods to verify that LLM-generated code satisfies specifications
  • Reasoning chain verification: Verify the logical correctness of each step in Chain-of-Thought reasoning
  • Fact checking: Convert factual claims in LLM outputs into verifiable logical propositions

7. Convergence with Deep Learning

7.1 AlphaProof

AlphaProof (DeepMind, 2024) applies AlphaZero-style reinforcement learning to mathematical theorem proving:

  • Proves IMO (International Mathematical Olympiad) competition problems in the Lean 4 formal environment
  • Models proof search as a game-like search problem
  • Uses LLMs to generate proof candidates, verified by the formal system

7.2 Neural Theorem Proving

System Method Target
GPT-f Transformer generates proof steps Lean/Metamath proofs
HTPS Hyper-tree search + neural networks Lean proof search
LeanDojo LLM + Lean interactive environment Open research platform
Llemma Mathematics-specialized LLM Mathematical reasoning and proof
DeepSeek-Prover Reinforcement learning + formal verification Lean theorem proving

Core approach:

  1. A language model generates proof tactics
  2. A formal system verifies the correctness of each step
  3. A search algorithm (e.g., MCTS) explores the proof space
  4. Reinforcement learning learns from successful proofs

7.3 Prospects and Challenges

Dimension Current State Challenges
Proving capability Can solve undergraduate-level math problems Still far from frontier mathematical research
Formalization coverage Mathlib covers substantial foundational mathematics Many advanced areas remain unformalized
Search efficiency LLM guidance reduces the search space Search spaces for complex proofs remain enormous
Generalization Performs well within the training distribution Limited generalization to entirely new problem types

Formalization Does Not Equal Understanding

Current neural theorem proving systems can find correct proofs, but this does not mean the system "understands" mathematical concepts. The correctness of proofs is guaranteed by the formal system, not by the neural network.


References

  • Biere et al., Handbook of Satisfiability
  • Harrison, Handbook of Practical Logic and Automated Reasoning
  • Baier & Katoen, Principles of Model Checking
  • Trinh et al., "Solving olympiad geometry without human demonstrations" (Nature, 2024)
  • Lean 4 Documentation
  • Z3 Theorem Prover

评论 #