Skip to content

Dynamic Programming

The Idea Behind Dynamic Programming

The core idea of dynamic programming is to break a large problem down into smaller subproblems, and by storing and reusing the solutions to these subproblems, avoid redundant computation and efficiently solve the original problem.

If the optimal solution to a problem contains the optimal solutions to its subproblems, then the problem exhibits the Optimal Substructure property.

  • Example: Suppose we want to find the shortest path from A to C. If B lies on the shortest path from A to C, then the sub-path from A to B and the sub-path from B to C must each be the shortest paths between their respective endpoints.

During the process of decomposing a large problem, if the same subproblems are computed multiple times, then the problem exhibits the Overlapping Subproblems property.

  • Example: Computing the Fibonacci number \(F(n)\). Computing \(F(5)\) requires \(F(4)\) and \(F(3)\); computing \(F(4)\) in turn requires \(F(3)\) and \(F(2)\). Here, \(F(3)\) is computed more than once.

Memoization (Top-Down)

Start from the large problem and recursively decompose it into subproblems. Before computing each subproblem, check whether its solution has already been computed and stored.

  • Procedure:
    1. Define a recursive function to solve the problem.
    2. Use a lookup table (e.g., an array or hash map) to store solutions to subproblems.
    3. At the beginning of the recursive function, check the lookup table: if the solution is already stored, return it directly; otherwise, compute the solution, store it in the table, and then return it.

Tabulation (Bottom-Up)

Start from the smallest subproblems and work upward, computing solutions iteratively until the original problem is solved.

  • Procedure:
    1. Define the states (subproblems).
    2. Create a table (typically an array dp[]) to store solutions to all subproblems.
    3. Determine the base cases and fill in the initial values of the table.
    4. Using the state transition equation, compute and fill in the table from small to large (or front to back).

Problem-Solving Steps

Most dynamic programming problems can be solved using the following steps:

  1. Identify the subproblems — for example, in the knapsack problem, the subproblem is dp[i][j].
  2. Determine the base cases — for example, in the Fibonacci sequence, dp[0]=0 and dp[1]=1 are the base cases.
  3. Derive the state transition equation — this is the core of dynamic programming. Once the transition equation is found, even the hardest problem becomes straightforward.
  4. Determine the computation order — for 1D-DP, this is typically from 0 to n; for 2D-DP, it is usually row-by-row or column-by-column in increasing order.

Sequence DP

Sequence DP: These problems typically operate on a one-dimensional sequence, where the state \(DP[i]\) depends on earlier elements in the sequence.

Fibonacci Sequence

The Fibonacci Sequence is the most classic introductory DP problem. Given \(n\), find the \(n\)-th Fibonacci number.

State definition: \(dp[i]\) represents the \(i\)-th Fibonacci number.

Base Case:

\[ dp[0] = 0, \quad dp[1] = 1 \]

State transition equation:

\[ dp[i] = dp[i-1] + dp[i-2], \quad i \geq 2 \]

Time complexity: \(O(n)\), Space complexity: \(O(1)\) (after rolling variable optimization)

def fib(n: int) -> int:
    if n <= 1:
        return n
    prev, curr = 0, 1
    for _ in range(2, n + 1):
        prev, curr = curr, prev + curr
    return curr

The naive recursive approach has a time complexity of \(O(2^n)\) due to a large number of overlapping subproblems. Using DP (Tabulation or Memoization) reduces the time complexity to \(O(n)\). This is the quintessential demonstration of the dynamic programming philosophy: eliminating redundant computation by storing subproblem solutions.

Longest Increasing Subsequence (LIS)

Longest Increasing Subsequence: Given an integer array \(nums\), find the length of the longest strictly increasing subsequence. Note that a subsequence does not need to be contiguous, but must preserve the relative order.

State definition: \(dp[i]\) represents the length of the longest increasing subsequence ending at \(nums[i]\).

Base Case:

\[ dp[i] = 1, \quad \forall \, i \in [0, n-1] \]

Each element by itself forms a subsequence of length 1.

State transition equation:

\[ dp[i] = \max_{0 \leq j < i, \, nums[j] < nums[i]} \left( dp[j] + 1 \right) \]

The final answer is \(\max(dp[0], dp[1], \ldots, dp[n-1])\).

Time complexity: \(O(n^2)\), Space complexity: \(O(n)\)

def lis_dp(nums: list[int]) -> int:
    n = len(nums)
    if n == 0:
        return 0
    dp = [1] * n
    for i in range(1, n):
        for j in range(i):
            if nums[j] < nums[i]:
                dp[i] = max(dp[i], dp[j] + 1)
    return max(dp)

\(O(n \log n)\) Greedy + Binary Search optimization: Maintain an array \(tails\), where \(tails[i]\) represents the smallest possible tail element of an increasing subsequence of length \(i+1\). For each element in \(nums\), use binary search to find the first position in \(tails\) that is \(\geq\) the current element and replace it; if no such position exists, append the element to the end.

import bisect

def lis_bisect(nums: list[int]) -> int:
    tails = []
    for x in nums:
        pos = bisect.bisect_left(tails, x)
        if pos == len(tails):
            tails.append(x)
        else:
            tails[pos] = x
    return len(tails)

The intuition behind the greedy strategy: to make the increasing subsequence as long as possible, we want the tail element of each length to be as small as possible, so that subsequent elements are more likely to extend the subsequence.

Kadane's Maximum Subarray Sum

Maximum Subarray Sum: Given an integer array \(nums\), find a contiguous subarray with the largest sum (containing at least one element) and return that sum. This is the classic Kadane's Algorithm.

State definition: \(dp[i]\) represents the maximum sum of a contiguous subarray ending at \(nums[i]\).

Base Case:

\[ dp[0] = nums[0] \]

State transition equation:

\[ dp[i] = \max\left( nums[i], \quad dp[i-1] + nums[i] \right) \]

Core idea: at each position \(i\), either extend the previous subarray by appending the current element (\(dp[i-1] + nums[i]\)), or start a new subarray from the current element (\(nums[i]\)). If the accumulated sum so far is negative, it is better to start fresh from the current element.

The final answer is \(\max(dp[0], dp[1], \ldots, dp[n-1])\).

Time complexity: \(O(n)\), Space complexity: \(O(1)\) (only one variable is needed to track the previous state)

def max_subarray(nums: list[int]) -> int:
    max_sum = curr_sum = nums[0]
    for i in range(1, len(nums)):
        curr_sum = max(nums[i], curr_sum + nums[i])
        max_sum = max(max_sum, curr_sum)
    return max_sum

Maximum Product Subarray

Maximum Product Subarray: Given an integer array \(nums\), find a contiguous subarray with the largest product and return that product.

The key difference from the maximum subarray sum problem is that a negative number multiplied by a negative number becomes positive. Therefore, we need to track both the maximum product and the minimum product ending at the current position.

State definition:

  • \(dp\_max[i]\) represents the maximum product of a contiguous subarray ending at \(nums[i]\)
  • \(dp\_min[i]\) represents the minimum product of a contiguous subarray ending at \(nums[i]\)

Base Case:

\[ dp\_max[0] = dp\_min[0] = nums[0] \]

State transition equation:

\[ dp\_max[i] = \max\left( nums[i], \quad dp\_max[i-1] \times nums[i], \quad dp\_min[i-1] \times nums[i] \right) \]
\[ dp\_min[i] = \min\left( nums[i], \quad dp\_max[i-1] \times nums[i], \quad dp\_min[i-1] \times nums[i] \right) \]

When \(nums[i]\) is negative, the minimum product from the previous step multiplied by this negative number may become the maximum product — this is why we need to track both the maximum and minimum simultaneously.

Time complexity: \(O(n)\), Space complexity: \(O(1)\)

def max_product(nums: list[int]) -> int:
    result = cur_max = cur_min = nums[0]
    for i in range(1, len(nums)):
        candidates = (nums[i], cur_max * nums[i], cur_min * nums[i])
        cur_max, cur_min = max(candidates), min(candidates)
        result = max(result, cur_max)
    return result

House Robber

House Robber: A thief plans to rob houses along a street, where each house contains a certain amount of cash. The constraint is that adjacent houses cannot both be robbed (as this would trigger the alarm). What is the maximum amount the thief can steal?

State definition: \(dp[i]\) represents the maximum amount obtainable when considering up to the \(i\)-th house.

Base Case:

\[ dp[0] = nums[0], \quad dp[1] = \max(nums[0], nums[1]) \]

State transition equation:

\[ dp[i] = \max\left( dp[i-1], \quad dp[i-2] + nums[i] \right) \]

For the \(i\)-th house, there are two choices: skip it (profit is \(dp[i-1]\)), or rob it (profit is \(dp[i-2] + nums[i]\), since adjacent houses cannot both be robbed, we transition from \(i-2\)).

Time complexity: \(O(n)\), Space complexity: \(O(1)\) (after rolling variable optimization)

def rob(nums: list[int]) -> int:
    if len(nums) <= 2:
        return max(nums)
    prev2, prev1 = nums[0], max(nums[0], nums[1])
    for i in range(2, len(nums)):
        prev2, prev1 = prev1, max(prev1, prev2 + nums[i])
    return prev1

Two-Sequence DP

This category of problems deals with the relationship between two sequences. The state \(DP[i][j]\) represents the optimal solution formed by the first \(i\) elements of the first sequence and the first \(j\) elements of the second sequence.

Longest Common Subsequence (LCS)

Longest Common Subsequence: Given two strings \(X\) and \(Y\), find the length of their longest common subsequence. A subsequence does not need to be contiguous but must preserve the relative order.

State definition: \(dp[i][j]\) represents the length of the longest common subsequence of the first \(i\) characters of \(X\) and the first \(j\) characters of \(Y\).

Base Case:

\[ dp[i][0] = 0, \quad dp[0][j] = 0 \]

When either string is empty, the LCS length is 0.

State transition equation:

\[ dp[i][j] = \begin{cases} dp[i-1][j-1] + 1 & \text{if } X[i-1] = Y[j-1] \\ \max\left( dp[i-1][j], \quad dp[i][j-1] \right) & \text{if } X[i-1] \neq Y[j-1] \end{cases} \]

If the two characters match, add 1 to the diagonal (upper-left) state; otherwise, take the larger value from the cell above or to the left.

Time complexity: \(O(mn)\), Space complexity: \(O(mn)\) (can be optimized to \(O(\min(m, n))\))

def lcs(X: str, Y: str) -> int:
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[m][n]

Edit Distance

Edit Distance (Levenshtein Distance): Given two strings \(word1\) and \(word2\), find the minimum number of operations required to convert \(word1\) into \(word2\). The allowed operations are: Insert, Delete, and Replace a single character.

State definition: \(dp[i][j]\) represents the minimum number of operations needed to convert the first \(i\) characters of \(word1\) into the first \(j\) characters of \(word2\).

Base Case:

\[ dp[i][0] = i, \quad dp[0][j] = j \]

Converting a non-empty string to an empty string requires deleting all characters; converting an empty string to a non-empty string requires inserting all characters.

State transition equation:

\[ dp[i][j] = \begin{cases} dp[i-1][j-1] & \text{if } word1[i-1] = word2[j-1] \\ 1 + \min \begin{cases} dp[i-1][j] & \text{(删除)} \\ dp[i][j-1] & \text{(插入)} \\ dp[i-1][j-1] & \text{(替换)} \end{cases} & \text{otherwise} \end{cases} \]

The meaning of each operation:

  • \(dp[i-1][j] + 1\): Delete \(word1[i-1]\), then convert the first \(i-1\) characters of \(word1\) into the first \(j\) characters of \(word2\)
  • \(dp[i][j-1] + 1\): First convert the first \(i\) characters of \(word1\) into the first \(j-1\) characters of \(word2\), then insert \(word2[j-1]\)
  • \(dp[i-1][j-1] + 1\): Replace \(word1[i-1]\) with \(word2[j-1]\)

Time complexity: \(O(mn)\), Space complexity: \(O(mn)\)

def edit_distance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
    return dp[m][n]

Edit Distance has broad applications in natural language processing (NLP), such as spell checking, DNA sequence alignment, and more.

Interval DP

Interval DP: This category of problems defines states over an interval \([i, j]\), with state transitions performed by enumerating split points or shrinking the interval. The computation order typically proceeds from shorter to longer intervals.

Longest Palindromic Subsequence

Longest Palindromic Subsequence: Given a string \(s\), find the length of the longest palindromic subsequence. The subsequence does not need to be contiguous.

State definition: \(dp[i][j]\) represents the length of the longest palindromic subsequence in the substring \(s[i..j]\) (from index \(i\) to \(j\)).

Base Case:

\[ dp[i][i] = 1 \quad (\text{单个字符本身就是回文}) \]

State transition equation:

\[ dp[i][j] = \begin{cases} dp[i+1][j-1] + 2 & \text{if } s[i] = s[j] \\ \max\left( dp[i+1][j], \quad dp[i][j-1] \right) & \text{if } s[i] \neq s[j] \end{cases} \]

If the two endpoint characters are equal, they can be included in the palindromic subsequence; otherwise, try removing either the left or right endpoint and take the larger value.

Computation order: Since \(dp[i][j]\) depends on \(dp[i+1][...]\), \(i\) must be iterated from large to small (or equivalently, iterate by interval length from small to large).

Time complexity: \(O(n^2)\), Space complexity: \(O(n^2)\)

def longest_palindrome_subseq(s: str) -> int:
    n = len(s)
    dp = [[0] * n for _ in range(n)]
    for i in range(n):
        dp[i][i] = 1
    for length in range(2, n + 1):      # 区间长度
        for i in range(n - length + 1):  # 起始索引
            j = i + length - 1           # 结束索引
            if s[i] == s[j]:
                dp[i][j] = dp[i + 1][j - 1] + 2
            else:
                dp[i][j] = max(dp[i + 1][j], dp[i][j - 1])
    return dp[0][n - 1]

An alternative approach is to reverse \(s\) to obtain \(s'\), then compute the LCS of \(s\) and \(s'\) — the result is equivalent to the longest palindromic subsequence.

Longest Palindromic Substring (LPS)

Longest Palindromic Substring: Given a string \(s\), find the longest contiguous palindromic substring. Note that unlike the longest palindromic subsequence, the substring must be contiguous.

State definition: \(dp[i][j]\) is a boolean indicating whether \(s[i..j]\) is a palindrome.

Base Case:

\[ dp[i][i] = \text{True} \quad (\text{单个字符}) \]
\[ dp[i][i+1] = (s[i] = s[i+1]) \quad (\text{两个字符}) \]

State transition equation:

\[ dp[i][j] = \left( s[i] = s[j] \right) \wedge dp[i+1][j-1], \quad j - i \geq 2 \]

That is: if the two endpoint characters are equal and the inner substring is also a palindrome, then \(s[i..j]\) is a palindrome. During iteration, record the longest palindromic substring found.

Time complexity: \(O(n^2)\), Space complexity: \(O(n^2)\)

def longest_palindrome_substr(s: str) -> str:
    n = len(s)
    if n < 2:
        return s
    dp = [[False] * n for _ in range(n)]
    start, max_len = 0, 1
    for i in range(n):
        dp[i][i] = True
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            if s[i] == s[j]:
                dp[i][j] = (length == 2) or dp[i + 1][j - 1]
            if dp[i][j] and length > max_len:
                start, max_len = i, length
    return s[start:start + max_len]

A more efficient solution is Manacher's Algorithm, which finds the longest palindromic substring in \(O(n)\) time. Manacher's algorithm exploits the symmetry of palindromes by maintaining a "palindrome radius array" to avoid redundant comparisons. Additionally, the center expansion method is another commonly used approach with \(O(n^2)\) time and \(O(1)\) space, expanding outward from each character (or each pair of adjacent characters) to find palindromes.

RNA Folding

RNA Secondary Structure Prediction: Given an RNA sequence \(s = b_1 b_2 \ldots b_n\), where each \(b_i \in \{A, U, G, C\}\), find the maximum number of base pairings. The valid pairings in RNA are A-U and G-C (Watson-Crick base pairs). The pairing rules are as follows:

  • Each base can participate in at most one pairing
  • Pairings cannot cross (i.e., if \((i, j)\) and \((k, l)\) are both pairings and \(i < k\), then either \(j < k\) or \(l < j\))
  • The two bases in a pairing must be separated by at least 4 positions (\(j - i > 4\)), to ensure physical foldability

This is a classic interval DP problem.

State definition: \(dp[i][j]\) represents the maximum number of pairings in the subsequence \(b_i, b_{i+1}, \ldots, b_j\).

Base Case:

\[ dp[i][j] = 0, \quad \text{if } j - i \leq 4 \]

Subsequences with a gap of 4 or fewer cannot form any pairing.

State transition equation:

For the interval \([i, j]\), consider all possibilities for base \(b_j\):

\[ dp[i][j] = \max \begin{cases} dp[i][j-1] & \text{($b_j$ 不参与配对)} \\ \max_{i \leq t < j-4, \, \text{match}(t, j)} \left( 1 + dp[i][t-1] + dp[t+1][j-1] \right) & \text{($b_j$ 与 $b_t$ 配对)} \end{cases} \]

Here, \(\text{match}(t, j)\) indicates that \(b_t\) and \(b_j\) form a valid base pair. When \(b_j\) pairs with \(b_t\), the interval is split into two independent sub-intervals \([i, t-1]\) and \([t+1, j-1]\) — this is the core idea of interval DP.

Time complexity: \(O(n^3)\), Space complexity: \(O(n^2)\)

def rna_folding(s: str) -> int:
    n = len(s)
    pair = {('A','U'), ('U','A'), ('G','C'), ('C','G')}
    dp = [[0] * n for _ in range(n)]
    for length in range(5, n):               # 区间长度至少为5
        for i in range(n - length):
            j = i + length
            # b_j 不配对
            dp[i][j] = dp[i][j - 1]
            # b_j 与某个 b_t 配对
            for t in range(i, j - 4):
                if (s[t], s[j]) in pair:
                    val = 1 + (dp[i][t - 1] if t > i else 0) + dp[t + 1][j - 1]
                    dp[i][j] = max(dp[i][j], val)
    return dp[0][n - 1]

RNA Folding is an important problem in computational biology. The Nussinov algorithm is the classic algorithm based on the interval DP approach described above. In practice, more complex scoring models such as free energy minimization are also considered.

Matrix Chain Multiplication

Matrix Chain Multiplication: Given a chain of \(n\) matrices \(A_1, A_2, \ldots, A_n\), where matrix \(A_i\) has dimensions \(p_{i-1} \times p_i\), find the minimum number of scalar multiplications needed to compute their product \(A_1 A_2 \cdots A_n\). Since matrix multiplication is associative, different parenthesizations lead to different computational costs.

For example, consider three matrices \(A_{10\times30}\), \(B_{30\times5}\), \(C_{5\times60}\):

  • \((AB)C\): \(10\times30\times5 + 10\times5\times60 = 4500\) multiplications
  • \(A(BC)\): \(30\times5\times60 + 10\times30\times60 = 27000\) multiplications

Different parenthesizations lead to a 6x difference in cost.

State definition: \(dp[i][j]\) represents the minimum number of scalar multiplications needed to compute the matrix chain \(A_i A_{i+1} \cdots A_j\).

Base Case:

\[ dp[i][i] = 0 \quad (\text{单个矩阵无需乘法}) \]

State transition equation:

\[ dp[i][j] = \min_{i \leq k < j} \left( dp[i][k] + dp[k+1][j] + p_{i-1} \cdot p_k \cdot p_j \right) \]

Enumerate the split point \(k\), dividing the matrix chain into \(A_i \cdots A_k\) and \(A_{k+1} \cdots A_j\). The total cost is the sum of each part's minimum cost plus the merging cost \(p_{i-1} \cdot p_k \cdot p_j\).

Computation order: Iterate by interval length from small to large.

Time complexity: \(O(n^3)\), Space complexity: \(O(n^2)\)

def matrix_chain_order(p: list[int]) -> int:
    """p: 维度数组,长度 n+1,矩阵 A_i 的维度为 p[i-1] x p[i]"""
    n = len(p) - 1
    dp = [[0] * n for _ in range(n)]
    for length in range(2, n + 1):       # 链长度
        for i in range(n - length + 1):
            j = i + length - 1
            dp[i][j] = float('inf')
            for k in range(i, j):
                cost = dp[i][k] + dp[k + 1][j] + p[i] * p[k + 1] * p[j + 1]
                dp[i][j] = min(dp[i][j], cost)
    return dp[0][n - 1]

Knapsack DP

Knapsack DP: The core of this category is selecting items under a limited capacity constraint to maximize total value (or satisfy some other optimality criterion). Depending on the item selection rules, variants include the 0/1 Knapsack, Unbounded Knapsack, and Bounded Knapsack.

0/1 Knapsack Problem

0/1 Knapsack Problem: There are \(n\) items, each with a weight \(w_i\) and a value \(v_i\), and a knapsack with capacity \(W\). Each item can either be included or excluded (no splitting allowed). Find the maximum total value that can be packed without exceeding the knapsack capacity.

State definition: \(dp[i][j]\) represents the maximum value achievable by selecting from the first \(i\) items with a knapsack capacity of \(j\).

Base Case:

\[ dp[0][j] = 0, \quad dp[i][0] = 0 \]

With no items to choose from or zero capacity, the maximum value is 0.

State transition equation:

\[ dp[i][j] = \begin{cases} dp[i-1][j] & \text{if } w_i > j \quad \text{(装不下第 $i$ 个物品)} \\ \max\left( dp[i-1][j], \quad dp[i-1][j-w_i] + v_i \right) & \text{if } w_i \leq j \end{cases} \]

For each item, either skip it (inherit \(dp[i-1][j]\)) or take it (transition from the remaining capacity \(j - w_i\) and add the current item's value).

Time complexity: \(O(nW)\), Space complexity: \(O(nW)\) (can be optimized to \(O(W)\) using a rolling array)

def knapsack_01(weights: list[int], values: list[int], W: int) -> int:
    n = len(weights)
    # 空间优化:一维DP,从后向前遍历防止重复使用同一物品
    dp = [0] * (W + 1)
    for i in range(n):
        for j in range(W, weights[i] - 1, -1):  # 逆序遍历
            dp[j] = max(dp[j], dp[j - weights[i]] + values[i])
    return dp[W]

The key to the space optimization is iterating over capacity in reverse order: since \(dp[i][j]\) only depends on \(dp[i-1][...]\) (the previous row), reverse iteration ensures that when updating \(dp[j]\), the value \(dp[j - w_i]\) is still from the previous round (i.e., round \(i-1\)), guaranteeing that each item is selected at most once.

Coin Changing

Coin Change: Given coins of different denominations \(coins = [c_1, c_2, \ldots, c_m]\) and a total \(amount\), find the minimum number of coins needed to make up that amount. Each coin denomination can be used an unlimited number of times. Return \(-1\) if the amount cannot be made up.

State definition: \(dp[j]\) represents the minimum number of coins needed to make up amount \(j\).

Base Case:

\[ dp[0] = 0 \quad (\text{凑成金额 0 不需要硬币}) \]
\[ dp[j] = \infty, \quad j \geq 1 \quad (\text{初始化为无穷大,表示暂时无法凑成}) \]

State transition equation:

\[ dp[j] = \min_{c \in coins, \, c \leq j} \left( dp[j - c] + 1 \right) \]

For each amount \(j\), try using each coin \(c\) and take the minimum across all valid choices.

Time complexity: \(O(amount \times m)\), Space complexity: \(O(amount)\)

def coin_change(coins: list[int], amount: int) -> int:
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0
    for j in range(1, amount + 1):
        for c in coins:
            if c <= j:
                dp[j] = min(dp[j], dp[j - c] + 1)
    return dp[amount] if dp[amount] != float('inf') else -1

Note the relationship between Coin Change and the unbounded knapsack: Coin Change is essentially a variant of the unbounded knapsack problem, except the optimization objective changes from "maximum value" to "minimum number of items."

Unbounded Knapsack

Unbounded Knapsack: Similar to the 0/1 knapsack, there are \(n\) types of items, each with weight \(w_i\) and value \(v_i\), and a knapsack with capacity \(W\). The difference is that each type of item can be selected an unlimited number of times. Find the maximum total value.

State definition: \(dp[j]\) represents the maximum value with knapsack capacity \(j\).

Base Case:

\[ dp[0] = 0 \]

State transition equation:

\[ dp[j] = \max_{w_i \leq j} \left( dp[j - w_i] + v_i \right) \]

Key difference from the 0/1 knapsack: Capacity is iterated in forward order rather than reverse. Forward iteration means that within the same round, \(dp[j - w_i]\) may already include item \(i\), thereby achieving the effect of "each item type can be selected an unlimited number of times."

Time complexity: \(O(nW)\), Space complexity: \(O(W)\)

def unbounded_knapsack(weights: list[int], values: list[int], W: int) -> int:
    dp = [0] * (W + 1)
    for i in range(len(weights)):
        for j in range(weights[i], W + 1):  # 正序遍历
            dp[j] = max(dp[j], dp[j - weights[i]] + values[i])
    return dp[W]
Knapsack Type Capacity Iteration Order Reason
0/1 Knapsack Reverse (from \(W\) to \(w_i\)) Ensures each item is used at most once
Unbounded Knapsack Forward (from \(w_i\) to \(W\)) Allows each item to be used multiple times

Bounded Knapsack

Bounded Knapsack: There are \(n\) types of items, where item type \(i\) has weight \(w_i\), value \(v_i\), and a quantity limit \(c_i\). The knapsack capacity is \(W\). Find the maximum total value.

The naive approach is to expand item type \(i\) into \(c_i\) independent items and solve as a 0/1 knapsack, but the time complexity is \(O(W \sum c_i)\), which is inefficient when quantities are large.

Binary splitting optimization: Split each item's quantity \(c_i\) into several "bundles" with sizes \(1, 2, 4, 8, \ldots\) (binary decomposition), such that any subset-sum of these bundles can represent any integer from \(0\) to \(c_i\). For example, \(c_i = 13\) is split into \(1, 2, 4, 6\) (the last being the remainder \(13 - 1 - 2 - 4 = 6\)). After splitting, each "bundle" is treated as an independent 0/1 knapsack item.

This reduces the total number of items from \(\sum c_i\) to \(\sum \log c_i\), bringing the time complexity to \(O(W \sum \log c_i)\).

State transition equation: After splitting, apply the 0/1 knapsack approach:

\[ dp[j] = \max\left( dp[j], \quad dp[j - w'] + v' \right) \]

where \(w'\) and \(v'\) are the weight and value of each "bundle."

Time complexity: \(O(nW\log c_{max})\), Space complexity: \(O(W)\)

def bounded_knapsack(weights, values, counts, W):
    # 二进制拆分
    new_w, new_v = [], []
    for i in range(len(weights)):
        c = counts[i]
        k = 1
        while k <= c:
            new_w.append(weights[i] * k)
            new_v.append(values[i] * k)
            c -= k
            k *= 2
        if c > 0:
            new_w.append(weights[i] * c)
            new_v.append(values[i] * c)
    # 转化为 0/1 背包
    dp = [0] * (W + 1)
    for i in range(len(new_w)):
        for j in range(W, new_w[i] - 1, -1):  # 逆序
            dp[j] = max(dp[j], dp[j - new_w[i]] + new_v[i])
    return dp[W]
Knapsack Type Items per Type Time Complexity
0/1 Knapsack 1 \(O(nW)\)
Unbounded Knapsack Unlimited \(O(nW)\)
Bounded Knapsack \(c_i\) \(O(nW\log c_{max})\) (with binary splitting)

2D-DP

2D-DP: These problems typically operate on a two-dimensional grid, where the state \(dp[i][j]\) represents some optimal value or number of ways to reach grid position \((i, j)\).

Unique Paths

Unique Paths: A robot is located at the top-left corner of an \(m \times n\) grid. It can only move either down or right one step at a time. How many distinct paths are there to reach the bottom-right corner?

State definition: \(dp[i][j]\) represents the number of distinct paths from the top-left corner to position \((i, j)\).

Base Case:

\[ dp[i][0] = 1, \quad dp[0][j] = 1 \]

Every position in the first row and first column can only be reached by a single path (moving entirely right or entirely down).

State transition equation:

\[ dp[i][j] = dp[i-1][j] + dp[i][j-1] \]

Position \((i, j)\) can only be reached from above \((i-1, j)\) or from the left \((i, j-1)\).

Time complexity: \(O(mn)\), Space complexity: \(O(n)\) (with row-wise rolling)

def unique_paths(m: int, n: int) -> int:
    dp = [1] * n
    for i in range(1, m):
        for j in range(1, n):
            dp[j] += dp[j - 1]
    return dp[n - 1]

This problem can also be solved directly using combinatorics: from top-left to bottom-right, exactly \(m + n - 2\) steps are taken, of which \(m - 1\) are downward and \(n - 1\) are rightward. Therefore, the answer is \(\binom{m+n-2}{m-1}\).

Triangle Minimum Path Sum

Triangle Minimum Path Sum: Given a triangle (represented as a 2D list), find a path from top to bottom that minimizes the sum of numbers along the path. At each step, you may only move to an adjacent node in the row below (i.e., from position \((i, j)\) you can move to \((i+1, j)\) or \((i+1, j+1)\)).

State definition: \(dp[i][j]\) represents the minimum path sum from the top to position \((i, j)\).

Base Case:

\[ dp[0][0] = triangle[0][0] \]

State transition equation:

\[ dp[i][j] = triangle[i][j] + \begin{cases} dp[i-1][0] & \text{if } j = 0 \\ dp[i-1][j-1] & \text{if } j = i \\ \min\left( dp[i-1][j-1], \quad dp[i-1][j] \right) & \text{otherwise} \end{cases} \]

The final answer is the minimum of the last row: \(\min(dp[n-1][0], dp[n-1][1], \ldots, dp[n-1][n-1])\).

Time complexity: \(O(n^2)\), Space complexity: \(O(n)\) (can use bottom-up DP, operating directly on a 1D array)

The bottom-up approach is more concise: start from the last row and merge upward row by row; ultimately, \(dp[0]\) is the answer.

def minimum_total(triangle: list[list[int]]) -> int:
    dp = triangle[-1][:]  # 从最后一行开始
    for i in range(len(triangle) - 2, -1, -1):
        for j in range(len(triangle[i])):
            dp[j] = triangle[i][j] + min(dp[j], dp[j + 1])
    return dp[0]

Tree DP

Tree DP is dynamic programming performed on tree structures. States are defined on tree nodes, and transitions typically proceed from child nodes to parent nodes (bottom-up). Since trees have no cycles, they naturally possess optimal substructure and the no-aftereffect property, making them highly suitable for DP.

Classic problem: Maximum Independent Set on a Tree

Given a tree with \(n\) nodes, each with a weight \(w_i\), select a subset of nodes such that no two selected nodes are adjacent (directly connected by an edge). Find the maximum total weight of selected nodes.

State definition:

  • \(dp[u][0]\): the maximum weight sum of the subtree rooted at \(u\), when node \(u\) is not selected
  • \(dp[u][1]\): the maximum weight sum of the subtree rooted at \(u\), when node \(u\) is selected

State transition equation:

\[ dp[u][0] = \sum_{v \in \text{children}(u)} \max\left( dp[v][0], \, dp[v][1] \right) \]
\[ dp[u][1] = w_u + \sum_{v \in \text{children}(u)} dp[v][0] \]

When \(u\) is not selected, each child may or may not be selected; when \(u\) is selected, none of its children can be selected.

Time complexity: \(O(n)\) — each node is visited exactly once.

def max_independent_set(adj, weights, root=0):
    n = len(weights)
    dp = [[0, 0] for _ in range(n)]
    visited = [False] * n

    def dfs(u):
        visited[u] = True
        dp[u][1] = weights[u]
        for v in adj[u]:
            if not visited[v]:
                dfs(v)
                dp[u][0] += max(dp[v][0], dp[v][1])
                dp[u][1] += dp[v][0]

    dfs(root)
    return max(dp[root][0], dp[root][1])

Tree DP In Depth

Tree DP uses subtrees as subproblems when performing DP on tree structures.

Classic problem: Maximum Independent Set on a Tree

Select as many nodes as possible from a tree such that no two adjacent nodes are both selected.

\[dp[v][0] = \sum_{u \in children(v)} \max(dp[u][0], dp[u][1])$$ $$dp[v][1] = \sum_{u \in children(v)} dp[u][0]\]

Here \(dp[v][0]\) means node \(v\) is not selected, and \(dp[v][1]\) means node \(v\) is selected.

Rerooting DP: First compute with an arbitrary root, then use recurrence relations to reroot, avoiding recomputation for each node.

Bitmask DP

Bitmask DP uses binary bits to represent the state of a set. An \(n\)-bit integer can represent whether each element in a set of size \(n\) is selected (1 for selected, 0 for not), yielding a total of \(2^n\) states. This technique is applicable when \(n\) is small (typically \(n \leq 20\)).

Classic problem: Travelling Salesman Problem (TSP)

Given \(n\) cities and the distance \(dist[i][j]\) between any two cities, find the shortest path that starts from city 0, visits every city exactly once, and returns to city 0.

State definition: \(dp[S][i]\) represents the shortest path length when the set of visited cities is \(S\) (represented as a bitmask) and the current city is \(i\).

Base Case:

\[ dp[\{0\}][0] = 0 \quad \text{i.e., } dp[1][0] = 0 \]

Starting from city 0, with only city 0 visited, the path length is 0.

State transition equation:

\[ dp[S][i] = \min_{j \in S, \, j \neq i, \, \text{bit } i \in S} \left( dp[S \setminus \{i\}][j] + dist[j][i] \right) \]

Remove city \(i\) from state \(S\), then enumerate the previously visited city \(j\).

The final answer is:

\[ \min_{i=1}^{n-1} \left( dp[2^n - 1][i] + dist[i][0] \right) \]

Time complexity: \(O(2^n \cdot n^2)\), Space complexity: \(O(2^n \cdot n)\)

def tsp(dist: list[list[int]]) -> int:
    n = len(dist)
    INF = float('inf')
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1][0] = 0  # 起点为城市0,bitmask = ...0001

    for S in range(1, 1 << n):
        for i in range(n):
            if not (S >> i & 1) or dp[S][i] == INF:
                continue
            for j in range(n):
                if S >> j & 1:
                    continue  # j 已经访问过
                dp[S | (1 << j)][j] = min(dp[S | (1 << j)][j],
                                          dp[S][i] + dist[i][j])

    full = (1 << n) - 1
    return min(dp[full][i] + dist[i][0] for i in range(1, n))

TSP is NP-hard. Brute-force enumeration of all permutations has a complexity of \(O(n!)\). Bitmask DP optimizes this to \(O(2^n \cdot n^2)\), which is feasible for \(n \leq 20\).

Bitmask DP In Depth

Use binary bits to represent set states, where \(dp[S]\) denotes the optimal value when the visited set is \(S\).

Classic problem: TSP (Travelling Salesman Problem)

\[dp[S][i] = \min_{j \in S, j \neq i} \{dp[S \setminus \{i\}][j] + dist(j, i)\}\]

Time complexity: \(O(2^n \cdot n^2)\), applicable for \(n \leq 20\).

Bit manipulation tricks: - Check bit \(i\): S & (1 << i) - Set bit \(i\): S | (1 << i) - Clear bit \(i\): S & ~(1 << i) - Enumerate subsets: for (int sub = S; sub; sub = (sub - 1) & S)

Digital DP

Digit DP is used to solve counting problems related to the individual digits of numbers. A typical problem takes the form: how many numbers in the range \([L, R]\) satisfy a certain digit-related property? Examples include: "How many numbers in \([1, N]\) have a digit sum of \(S\)?", "How many numbers contain no digit 4?", etc.

Core idea: Construct numbers digit by digit, using a tight flag to indicate whether the current digit is still constrained by the upper bound \(N\). If all preceding digits have taken their maximum values (matching the corresponding digits of \(N\)), then the current digit can be at most the corresponding digit of \(N\) (tight = True); otherwise, the current digit can range from \(0\) to \(9\) (tight = False).

State definition: \(dp[pos][state][tight]\), where \(pos\) is the current digit position being processed, \(state\) is a problem-specific state (e.g., digit sum, whether a certain digit has appeared), and \(tight\) indicates whether the upper bound constraint is active.

General template (memoized search):

from functools import lru_cache

def count(N: int) -> int:
    """计算 [0, N] 中满足条件的数的个数"""
    digits = list(map(int, str(N)))

    @lru_cache(maxsize=None)
    def dfs(pos, state, tight, started):
        if pos == len(digits):
            return 1 if started else 0  # 根据问题调整
        limit = digits[pos] if tight else 9
        result = 0
        for d in range(0, limit + 1):
            result += dfs(
                pos + 1,
                new_state(state, d),       # 根据问题定义状态转移
                tight and (d == limit),
                started or (d > 0)
            )
        return result

    return dfs(0, initial_state, True, False)

The \(started\) flag is used to handle leading zeros. For range queries over \([L, R]\), the problem can be transformed using the prefix sum idea: \(count(R) - count(L - 1)\).

The time complexity of Digit DP is typically \(O(\log N \times |state| \times 10)\), where \(|state|\) is the size of the state space and depends on the specific problem. Compared to the brute-force \(O(N)\) approach, Digit DP offers enormous advantages when \(N\) is extremely large (e.g., \(10^{18}\)).

Digit DP In Depth

Count numbers in \([1, N]\) satisfying a given condition by constructing digit by digit.

State design: \(dp[pos][tight][state]\) - \(pos\): current digit position - \(tight\): whether the upper bound constraint is still active - \(state\): problem-specific state (e.g., digit sum, whether a certain digit appears)

Template:

def digit_dp(N):
    digits = [int(d) for d in str(N)]

    @cache
    def solve(pos, tight, state):
        if pos == len(digits):
            return check(state)
        limit = digits[pos] if tight else 9
        result = 0
        for d in range(0, limit + 1):
            result += solve(pos + 1, tight and d == limit, update(state, d))
        return result

    return solve(0, True, initial_state)

Applications: Count numbers containing specific digits within a range, numbers whose digit sum is a multiple of \(k\), etc.

Game DP

Game DP involves two or more players adopting optimal strategies. State transitions typically reflect the game-theoretic process of maximizing one's own payoff while minimizing the opponent's payoff.

Coin Game

Suppose there is a row of \(n\) coins with values given by an array \(A = [v_1, v_2, \ldots, v_n]\), where \(v_i\) is the value of the \(i\)-th coin. Given \(n \in \mathbb{N}\) and \(n\) is even.

Two players (Player 1 and Player 2) take turns picking a coin from the row. Player 1 goes first. On each turn, the current player must pick either the leftmost or the rightmost coin and add its value to their total score.

Both players play optimally: they both aim to maximize their own final total score (i.e., maximize their payoff relative to the opponent).

Goal: Design an algorithm to determine the maximum guaranteed payoff that Player 1 can achieve under optimal play.

We will use dynamic programming to solve this problem. Define \(MAX\_GAIN(i, j)\) as: when only the subarray \([v_i, v_{i+1}, \ldots, v_j]\) of array \(A\) remains (the length \(j - i + 1\) of this subarray must be a positive even number), the maximum guaranteed payoff the current player can obtain.

Solution:

This problem left the deepest impression on me during my algorithms course at university — I spent a long time unable to solve it. Let us walk through it together.

First, let us review the rules:

  • There is a row of coins represented by array A, of length n, where n is even, and each coin has a value
  • On each turn, only the leftmost or rightmost coin can be taken
  • The two players alternate turns, with Player 1 going first
  • Both players aim to maximize their own total value

The hardest part of this problem is that both players want to maximize their own total value, creating a game-theoretic situation. When making a choice, each player must simultaneously consider: after my move, what kind of remaining game will I leave for my opponent? How will the opponent exploit this to maximize their own payoff?

For example:

\[ A = [10, \quad 5, \quad 1, \quad 100] \]

Clearly, P1 will take 100 first, because if P1 takes 10, P2 would get the decisive 100. So P1 must take 100. After P1 takes 100, P2 will take 10, then P1 takes 5, and finally P2 takes 1.

As we can see, in any round, the current player will take the coin that leads to the maximum eventual payoff, i.e.:

\[ MAX\_GAIN = \max \left\{ \text{take-left path}, \quad \text{take-right path} \right\} \]

But what happens when there are more coins? Let us increase the number:

\[ A = [10, 5, 1, 100, 20, 50, 90, 100, 80, 30, 100, 30, 5, 10, 5, 10] \]

Clearly, it is impossible to tell at a glance which side P1 should take first, since the list contains multiple 100s and there is no single decisive move.

Let us think through this step by step. Let L be a sub-instance of A.

Start with the base case. When only two coins remain, \(L=[v_i, v_{i+1}]\), obviously:

\[ MAX\_GAIN(i, i+1) = \max(v_i, v_{i+1}) \]

Next, consider 4 remaining coins, \(L = [v_i, v_{i+1}, v_{i+2}, v_{i+3}]\). Regardless of whether P1 picks left or right, P2 will pick whichever maximizes their own payoff. That is, if P1 takes vi, then P2 will take either vi+1 or vi+3; if P2 takes vi+1, the remaining two coins reach the base case, and P1's payoff is vi plus MAX_GAIN(i+1, i+2) or MAX_GAIN(i+2, i+3). Since P2 will take whichever option benefits them more, P1 is left with the less favorable option. Thus P1's payoff becomes:

\[ \text{P1's payoff from taking }v_i = v_i + \min \left\{ MAX\_GAIN(i+2, i+3), \quad MAX\_GAIN(i+1, i+2) \right\} \]

Similarly, if P1 takes vi+3:

\[ \text{P1's payoff from taking }v_{i+3} = v_{i+3} + \min \left\{ MAX\_GAIN(i+1, i+2), \quad MAX\_GAIN(i, i+1) \right\} \]

Once we can formulate this induction, the problem is 90% solved. There are many possible inductive formulations, but the Minimax approach described above is essentially the only correct one, dictated by the nature of the problem. In standard Game Theory, when both players seek to maximize their own payoff, the Minimax principle applies: both P1 and P2 aim to maximize their own payoff. Since this is a zero-sum (or near zero-sum) game, the more P2 gains, the less is typically left for P1. Therefore, when P1 makes a decision, they must anticipate: "No matter what I do, my opponent P2 will choose the option most favorable to themselves from the remaining game state most unfavorable to me." P1's "maximum guaranteed payoff" means P1 must assume the worst case. From P1's perspective, the opponent P2's action minimizes P1's next-round payoff (\(\min\)).

In other words, from P1's perspective, their own turn and the opponent's turn amount to:

  1. On their own turn, maximize their payoff: \(MAX\_GAIN(i, j) = \max \left\{ \text{path } 1, \quad \text{path } 2 \right\}\)
  2. On the opponent's turn, P1 must assume the opponent will adopt the optimal strategy to minimize the future payoff left for P1. Therefore, P1 must factor in this worst-case \(\min\) result in the total payoff calculation: \(\text{path } 1 = v_i + \min \left\{ MAX\_GAIN(\dots), MAX\_GAIN(\dots) \right\}\)

Therefore, Minimax is the only correct approach for this problem. Once we can write the Minimax formula for L=4, we can generalize to arbitrary length L:

\[ MAX\_GAIN(i, j) = \max \begin{Bmatrix} \underbrace{v_i + \min \left( MAX\_GAIN(i+2, j), \quad MAX\_GAIN(i+1, j-1) \right)}_{\text{P1 拿 } v_i}, \\ \underbrace{v_j + \min \left( MAX\_GAIN(i+1, j-1), \quad MAX\_GAIN(i, j-2) \right)}_{\text{P1 拿 } v_j} \end{Bmatrix} \]

We use a 2D array (DP table) to store all computed \(MAX\_GAIN(i, j)\) values. \(DP[i][j]\) stores \(MAX\_GAIN(i, j)\), with size \(n \times n\).

We can now present the DP solution:

Base Case (L=2)

\[ DP[i][i+1] = \max(v_i, v_{i+1}) \]

State Transition:

\[ DP[i][j] = \max \begin{Bmatrix} v_i + \min \left( DP[i+2][j], \quad DP[i+1][j-1] \right), \\ v_j + \min \left( DP[i+1][j-1], \quad DP[i][j-2] \right) \end{Bmatrix} \]

During the solution process, we start from the shortest subarray \(L=2\) and gradually increase the length \(L\) up to \(n\). Therefore, in terms of computation order, the outer loop controls the length L (from \(2\) to \(n\), with step size \(2\)); the inner loop controls the starting index \(i\) (from \(0\) to \(n-L\)), iterating over all possible starting positions of the subarray; inside the loop, j is computed — \(j\) is the ending index of the subarray, derived from i and L.

The length of the subarray can be written as:

\[ L = j - i + 1 \]

That is:

\[ j = i + L - 1 \]

After the loops complete, the final result is:

\[ \text{Player 1's maximum guaranteed payoff} = DP[1][n] \]

.

Tic-Tac-Toe

Tic-Tac-Toe is the classic noughts and crosses game. Since the state space of tic-tac-toe is extremely small, we can treat it as a memoized minimax problem and solve it using dynamic programming.

Define the subproblem: \(MINIMAX(Board, Player)\), which means: given the current board state (Board) and the current player's turn (Player, X or O), compute the terminal evaluation for the current player under optimal play (i.e., \(\mathbf{+1, 0, \text{or} -1}\)).

Base Case: The Minimax recursion terminates when a terminal game state is reached, returning the exact terminal evaluation:

Terminal Condition Return Value Meaning
Current player wins \(\mathbf{+1}\) The best possible outcome for the current player.
Opponent wins \(\mathbf{-1}\) The worst possible outcome for the current player.
Draw \(\mathbf{0}\) The board is full with no winner.

State transition formula:

The current player (MAX Player, e.g., X) always chooses the move that yields the maximum evaluation:

\[ MINIMAX(\text{Board}, \text{MAX}) = \max_{\text{Move} \in \text{Possible Moves}} \left\{ MINIMAX(\text{Board}_{\text{new}}, \text{MIN}) \right\} \]

Correspondingly, the opponent (MIN Player, e.g., O) always chooses the move that yields the minimum evaluation (minimizing MAX Player's payoff):

\[ MINIMAX(\text{Board}, \text{MIN}) = \min_{\text{Move} \in \text{Possible Moves}} \left\{ MINIMAX(\text{Board}_{\text{new}}, \text{MAX}) \right\} \]

Computation order: top-down DFS search combined with memoization:

  1. Start from the current board state (root node).
  2. Recursively search all possible next moves (downward).
  3. Continue until a base case (terminal game state) is reached.
  4. From the terminal states, backtrack the \(+1, 0, -1\) evaluations to parent nodes.
  5. Each parent node, depending on whether it is a \(\max\) or \(\min\) player, selects the \(\max\) or \(\min\) among its children's returned values as its own evaluation.

Above we stated that the opponent always chooses to minimize the MAX Player's payoff. At first glance this seems incorrect, since P2 is only maximizing their own payoff, not explicitly minimizing P1's. However, in game theory, because this game is a two-player zero-sum game, P2 maximizing their own payoff is mathematically equivalent to minimizing P1's payoff.

Let us define the payoffs:

  • \(S_1\): Player 1's final score.
  • \(S_2\): Player 2's final score.
  • \(T\): The total sum of all coins (or all points) in the game — a constant.

Since this is a zero-sum (or near zero-sum) game where all coins are eventually distributed and the total \(T\) is fixed:

\[ S_1 + S_2 = T \]

From P1's perspective, P1 must assume that P2 will maximize \(S_2\). Since \(S_2 = T - S_1\), to maximize \(S_2\), P2 must choose an action that maximizes \(T - S_1\), which in turn implies minimizing \(S_1\).


Let us analyze tic-tac-toe in greater depth. The dynamic programming used in tic-tac-toe essentially uses a memo to record states. During each execution, the game's P2 performs a DFS over the entire state space — this is fundamentally an exhaustive DFS over the game tree.

Let us examine a specific game play. We use indices 0-8 to represent the 9 positions on the board.

Suppose: Player X plays optimally by opening in the center (index 4).

0 1 2
_ _ _
_ X _
_ _ _

Current board state (Level 1): [_, _, _, _, X, _, _, _, _] — P2 (O) to move (MAX layer)

Level 1: P2 (O) Decision (MAX action)

P2 tries all 8 empty positions to place O. This is P2's MAXIMIZING action, with the objective \(\max(+1, 0, -1)\).

\[ \text{Score} = \max \big\{ \text{Minimax}(\text{O plays 0}), \text{Minimax}(\text{O plays 1}), \dots, \text{Minimax}(\text{O plays 8}) \big\} \]

The corresponding states are:

P2 (O) Action Resulting State Next Level
O plays 0 [O, _, _, _, X, _, _, _, _] \(\to\)Level 2 (MIN)
O plays 1 [_, O, _, _, X, _, _, _, _] \(\to\)Level 2 (MIN)
O plays 2 [_, _, O, _, X, _, _, _, _] \(\to\)Level 2 (MIN)
\(\dots\) \(\dots\) \(\dots\)

Level 2: P1 (X) Prediction (MIN action)

The Minimax algorithm recurses to this level, predicting that opponent X will take the action that minimizes P2's payoff. This is the MINIMIZING action, with the objective \(\min(+1, 0, -1)\).

Suppose P2 chose O plays 0: [O, _, _, _, X, _, _, _, _]

P1 (X) now has 7 empty positions to play. Minimax iterates over these 7 actions:

\[ \text{Minimax}(\text{O plays 0}) = \min \big\{ \text{Minimax}(\text{X plays 1}), \dots, \text{Minimax}(\text{X plays 8}) \big\} \]

The states are:

P1 (X) Predicted Action Resulting State Next Level
X plays 1 [O, X, _, _, X, _, _, _, _] \(\to\)Level 3 (MAX)
X plays 2 [O, _, X, _, X, _, _, _, _] \(\to\)Level 3 (MAX)
\(\dots\) \(\dots\) \(\dots\)

Level 3: P2 (O) Decides Again (MAX action)

The Minimax algorithm recurses to this level, where P2 attempts to take the action that maximizes their payoff. P2 now has 6 empty positions to play.

\[ \text{Minimax}(\text{X plays 1}) = \max \big\{ \text{Minimax}(\text{O plays 2}), \dots, \text{Minimax}(\text{O plays 8}) \big\} \]

This process repeats until a leaf node is reached (win, loss, or draw), at which point the \(\mathbf{+1, 0, -1}\) evaluations are propagated back layer by layer, ultimately determining the best action at Level 1.

In simple terms, during P2's first DFS, the search reaches Level 8 where a full board is encountered — at that point, whether the result is +1, 0, or -1 is definitive. Throughout the search, all states and their corresponding results are recorded, and the maximum guaranteed value is retained (since P2 assumes both players will act to maximize their own interests). By the second or third search, because the memo has already stored states and their optimal values, P2 can more quickly determine the terminal evaluation and make the optimal choice.

The core reason the memoized values are reliable is:

  • For any given board state, its minimax evaluation is unique, so what the memo records is precisely the optimal subtree's terminal evaluation
  • The first search along any branch will necessarily reach a terminal state (base case), ensuring the accuracy of the result

It is precisely this mechanism that allows the Minimax algorithm to reduce the number of computations from an exponential 250,000 paths to a polynomial 5,478 unique states.

In theory, any zero-sum game can produce a memo table recording the exact value of every state using the above method — whether it is chess, Chinese chess, or Go, all are fundamentally minimax games. This is Zermelo's Theorem:

If both sides play optimally, the initial position of chess necessarily has a determinate outcome: White (first mover) wins, Black (second mover) wins, or the game is drawn.

However, the total number of states in chess is approximately \(10^{43}\) to \(10^{50}\), and in Go approximately \(10^{170}\), while the upper limit of states that modern supercomputers can brute-force solve (i.e., fully compute and store Minimax evaluations) is roughly \(10^{14}\). The reason the total state count for chess is not an exact number is that chess may involve repeated positions leading to potential infinite loops, which is why chess has draw rules for threefold repetition and the fifty-move rule. The most famous estimate of chess's game tree complexity was given by Shannon in his 1950 paper "Programming a Computer for Playing Chess," and is known as the Shannon Number.

For more on Minimax and deeper explorations, please refer to the notes: Classical AI >> Adversarial Search and Games, where I systematically cover topics from minimax through chess and Go (AlphaGo) strategies and implementations.


评论 #