Automated Reasoning

Automated reasoning is the field within artificial intelligence that studies how to make computers automatically perform logical deduction. It encompasses a broad range of techniques from Boolean satisfiability (SAT) solving to formal theorem proving and serves as a foundational tool for software verification, hardware design, mathematical proof, and AI safety.

In recent years, the convergence of deep learning and formal methods (e.g., AlphaProof, neural theorem proving) has ushered in a profound transformation in automated reasoning. These notes cover SAT/SMT solving, theorem proving, formal verification, and cutting-edge directions at the intersection of formal methods and deep learning.

1. Classification of Reasoning Systems

Dimension	Type	Description
By logic level	Propositional logic reasoning	Variables take only true/false values
	First-order logic reasoning	Supports quantifiers (\(\forall, \exists\)) and predicates
	Higher-order logic reasoning	Variables can be functions or predicates themselves
By automation level	Fully automated	SAT/SMT solvers
	Semi-automated (interactive)	Lean, Coq, Isabelle
By reasoning direction	Forward reasoning	Derive from axioms and known theorems
	Backward reasoning	Decompose the goal into subgoals

2. The SAT Problem and Solvers

2.1 Boolean Satisfiability

The SAT problem: Given a Boolean formula \(\varphi\) (typically in CNF), determine whether there exists a variable assignment that makes \(\varphi\) true.

\[ \varphi = (x_1 \lor \neg x_2) \land (\neg x_1 \lor x_3) \land (x_2 \lor \neg x_3) \]

SAT was the first problem proven to be NP-complete (Cook-Levin theorem, 1971)
Despite exponential worst-case complexity, modern SAT solvers perform remarkably well on practical instances

2.2 The DPLL Algorithm

DPLL (Davis-Putnam-Logemann-Loveland, 1962) is the earliest systematic SAT solving algorithm:

DPLL(φ):
    1. Unit Propagation:
       If a clause contains only one literal, that literal must be set to true
    2. Pure Literal Elimination:
       If a variable appears only in positive or only in negative form, assign accordingly
    3. Branching:
       Choose a variable, try assigning true/false, and solve recursively
    4. Backtracking:
       If the current assignment leads to a conflict, backtrack to the previous decision point

2.3 The CDCL Algorithm

CDCL (Conflict-Driven Clause Learning) is the core algorithm of modern SAT solvers, extending DPLL with:

Enhancement	Description
Conflict analysis	When a conflict occurs, analyze its cause and learn a conflict clause
Non-chronological backtracking	Backtrack directly to the root cause of the conflict (not just the previous decision point)
Clause learning	Add conflict clauses to the formula to avoid redundant search
Variable selection heuristics	Strategies such as VSIDS (Variable State Independent Decaying Sum)
Restart policies	Periodically restart the search while retaining learned clauses

Representative solvers: MiniSat, CaDiCaL, Kissat, Z3 (SAT component)

SAT Competition

The annual SAT Competition has driven rapid advances in solver technology. Modern solvers can handle instances with millions of variables.

2.4 WalkSAT

WalkSAT is a randomized local search algorithm:

Randomly generate an initial assignment
Select an unsatisfied clause
With probability \(p\), randomly flip a variable in that clause; with probability \(1-p\), flip the variable that minimizes the number of unsatisfied clauses
Repeat until a satisfying assignment is found or a limit is reached

Effective for finding satisfying assignments but cannot prove unsatisfiability.

3. SMT Solvers

3.1 Satisfiability Modulo Theories

SMT (Satisfiability Modulo Theories) generalizes SAT to richer logical theories:

\[ (x + y > 3) \land (x < 2) \land (y > 0) \land (z = x \cdot y) \]

Background Theory	Abbreviation	Example
Linear Integer/Real Arithmetic	LIA/LRA	\(x + 2y \leq 5\)
Bitvectors	BV	32-bit integer overflow detection
Arrays	AR	\(\text{read}(\text{write}(a, i, v), i) = v\)
Uninterpreted Functions	UF	\(f(x) = f(y) \Rightarrow x = y\) (if \(f\) is injective)
Strings	S	Regular expression matching

3.2 The DPLL(T) Framework

Modern SMT solvers adopt the DPLL(T) framework: a SAT solver handles the Boolean skeleton while a theory solver checks theory consistency.

SAT Solver ←→ Theory Solver(T)
  Boolean assignment → T-consistency check
  Conflict clause    ← T-conflict explanation

3.3 Introduction to Z3

Z3 (Microsoft Research) is the most widely used SMT solver:

from z3 import *

x, y = Ints('x y')
s = Solver()
s.add(x + y == 10)
s.add(x > 0, y > 0)
s.add(x > y)

if s.check() == sat:
    print(s.model())  # [x = 6, y = 4]

Application domains: program verification, symbolic execution, constraint solving, test generation.

4. Theorem Proving

4.1 Automated Theorem Proving

Resolution is the foundation of automated theorem proving:

Negate the proposition to be proved and convert to CNF
Repeatedly apply the resolution rule: from \((A \lor C)\) and \((\neg A \lor D)\), derive \((C \lor D)\)
If the empty clause \(\Box\) is derived, the original proposition holds (proof by contradiction)

Automated Theorem Prover	Characteristics
E Prover	First-order logic, equational reasoning
Vampire	First-order logic, competition winner
SPASS	First-order logic, ordering-based strategy

4.2 Interactive Theorem Proving

Interactive theorem provers (ITPs) require humans to provide proof strategies while the computer verifies the correctness of each step.

Proof Assistant	Logical Foundation	Language	Notable Projects
Lean	Dependent type theory (CIC variant)	Lean 4	Mathlib (1M+ theorems)
Coq	Calculus of Inductive Constructions (CIC)	Gallina	CompCert (verified C compiler)
Isabelle	Higher-order logic (HOL)	Isar	seL4 (verified microkernel)
Agda	Dependent type theory	Agda	Type theory research

Lean 4 example (proving \(n + 0 = n\)):

theorem add_zero (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n ih => simp [Nat.add_succ, ih]

The Mathlib Project

Mathlib is the largest formalized mathematics library for Lean 4, covering algebra, analysis, topology, number theory, and more, with over 1 million formalized theorems and definitions.

5. Formal Verification

5.1 Model Checking

Model checking automatically verifies whether a finite-state system satisfies a given temporal logic property.

Input: System model \(M\) (finite state machine) + property \(\phi\) (e.g., a CTL/LTL formula)

Output: \(M \models \phi\) (satisfies) or a counterexample trace

Tool	Application Domain
SPIN	Concurrent protocol verification
NuSMV	Hardware design verification
CBMC	Bounded model checking for C/C++ programs
TLA+	Distributed system design (used extensively at Amazon)

5.2 Program Verification

Method	Description	Tools
Hoare logic	Derive program correctness using preconditions/postconditions	Dafny, Frama-C
Symbolic execution	Execute programs with symbolic values instead of concrete ones	KLEE, angr
Abstract interpretation	Analyze program behavior over abstract domains	Astree
Bounded model checking	Unroll loops to a finite bound and reduce to SAT/SMT	CBMC

6. Formal Methods in AI

6.1 Neural Network Verification

Verify the robustness of neural networks: for all perturbations \(x'\) within an \(\epsilon\)-neighborhood of input \(x\), does the network output remain consistent?

\[ \forall x' \in B_\epsilon(x): \; \arg\max f_\theta(x') = \arg\max f_\theta(x) \]

Verification Tool	Approach
\(\alpha,\beta\)-CROWN	Branch and bound + linear relaxation
Marabou	SMT-based verification
ERAN	Abstract interpretation
VNN-COMP	Annual neural network verification competition

6.2 LLM Output Verification

Code verification: Use formal methods to verify that LLM-generated code satisfies specifications
Reasoning chain verification: Verify the logical correctness of each step in Chain-of-Thought reasoning
Fact checking: Convert factual claims in LLM outputs into verifiable logical propositions

7. Convergence with Deep Learning

7.1 AlphaProof

AlphaProof (DeepMind, 2024) applies AlphaZero-style reinforcement learning to mathematical theorem proving:

Proves IMO (International Mathematical Olympiad) competition problems in the Lean 4 formal environment
Models proof search as a game-like search problem
Uses LLMs to generate proof candidates, verified by the formal system

7.2 Neural Theorem Proving

System	Method	Target
GPT-f	Transformer generates proof steps	Lean/Metamath proofs
HTPS	Hyper-tree search + neural networks	Lean proof search
LeanDojo	LLM + Lean interactive environment	Open research platform
Llemma	Mathematics-specialized LLM	Mathematical reasoning and proof
DeepSeek-Prover	Reinforcement learning + formal verification	Lean theorem proving

Core approach:

A language model generates proof tactics
A formal system verifies the correctness of each step
A search algorithm (e.g., MCTS) explores the proof space
Reinforcement learning learns from successful proofs

7.3 Prospects and Challenges

Dimension	Current State	Challenges
Proving capability	Can solve undergraduate-level math problems	Still far from frontier mathematical research
Formalization coverage	Mathlib covers substantial foundational mathematics	Many advanced areas remain unformalized
Search efficiency	LLM guidance reduces the search space	Search spaces for complex proofs remain enormous
Generalization	Performs well within the training distribution	Limited generalization to entirely new problem types

Formalization Does Not Equal Understanding

Current neural theorem proving systems can find correct proofs, but this does not mean the system "understands" mathematical concepts. The correctness of proofs is guaranteed by the formal system, not by the neural network.

References

Biere et al., Handbook of Satisfiability
Harrison, Handbook of Practical Logic and Automated Reasoning
Baier & Katoen, Principles of Model Checking
Trinh et al., "Solving olympiad geometry without human demonstrations" (Nature, 2024)
Lean 4 Documentation
Z3 Theorem Prover