Skip to content

Randomized and Approximation Algorithms

Overview

Randomized algorithms leverage randomness to simplify design, improve efficiency, or handle uncertainty. Approximation algorithms sacrifice exact optimality to provide solutions with quality guarantees in polynomial time. Both are important tools for tackling hard problems.


1. Classification of Randomized Algorithms

1.1 Las Vegas Algorithms

  • Results are always correct
  • Running time is a random variable
  • Expected running time is guaranteed

Example: Randomized quicksort

1.2 Monte Carlo Algorithms

  • Running time is deterministic (or has an upper bound)
  • Results may be incorrect, but the error probability is bounded

Example: Miller-Rabin primality test

graph TD
    RA[Randomized Algorithms] --> LV[Las Vegas]
    RA --> MC[Monte Carlo]
    LV --> LV1[Result always correct]
    LV --> LV2[Time is random]
    MC --> MC1[Time is deterministic]
    MC --> MC2[Result may be incorrect]
    MC --> MC3[Error probability reducible by repetition]

2. Randomized Quicksort

2.1 Algorithm

Randomly select the pivot element:

import random

def randomized_quicksort(arr, lo, hi):
    if lo < hi:
        # Randomly select pivot
        pivot_idx = random.randint(lo, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        p = partition(arr, lo, hi)
        randomized_quicksort(arr, lo, p - 1)
        randomized_quicksort(arr, p + 1, hi)

2.2 Expected Time Analysis

Theorem: The expected number of comparisons in randomized quicksort is \(O(n \log n)\).

Proof: Let the sorted elements be \(z_1 < z_2 < \cdots < z_n\). Define indicator random variables:

\[ X_{ij} = \begin{cases} 1 & \text{if } z_i \text{ and } z_j \text{ are compared} \\ 0 & \text{otherwise} \end{cases} \]

Total comparisons:

\[ X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} X_{ij} \]

\(z_i\) and \(z_j\) are compared if and only if the first element chosen as pivot from \(\{z_i, z_{i+1}, \ldots, z_j\}\) is either \(z_i\) or \(z_j\), which occurs with probability \(\frac{2}{j - i + 1}\).

\[ E[X] = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \frac{2}{j - i + 1} = \sum_{i=1}^{n-1} \sum_{k=2}^{n-i+1} \frac{2}{k} \leq 2n \sum_{k=2}^{n} \frac{1}{k} = O(n \log n) \]

3. Bloom Filters

3.1 Principle

A Bloom filter is a probabilistic data structure that supports efficient set membership queries.

  • Structure: A bit array of length \(m\) + \(k\) independent hash functions
  • Insert: Set the \(k\) hashed positions to 1
  • Query: Check whether all \(k\) positions are 1

3.2 Error Rate

False positive probability (after inserting \(n\) elements):

\[ p \approx \left(1 - e^{-kn/m}\right)^k \]

Optimal number of hash functions:

\[ k^* = \frac{m}{n} \ln 2 \]

In this case, \(p \approx (0.6185)^{m/n}\).

class BloomFilter:
    def __init__(self, m, k, hash_funcs):
        self.bits = [False] * m
        self.m = m
        self.hash_funcs = hash_funcs[:k]

    def insert(self, item):
        for h in self.hash_funcs:
            self.bits[h(item) % self.m] = True

    def query(self, item):
        return all(self.bits[h(item) % self.m] for h in self.hash_funcs)

Characteristics:

  • No false negatives (if it says not present, it is definitely not present)
  • False positives possible (may falsely report presence)
  • Extremely space-efficient
  • Does not support deletion (use Counting Bloom Filter for that)

4. Skip Lists

4.1 Structure

A skip list is a randomized data structure where each node is promoted to the next level with probability \(p = 1/2\).

Level 3:  1 -----------------> 9
Level 2:  1 -------> 5 ------> 9
Level 1:  1 --> 3 -> 5 -> 7 -> 9
Level 0:  1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9

4.2 Performance

Operation Expected Time Worst-Case Time
Search \(O(\log n)\) \(O(n)\)
Insert \(O(\log n)\) \(O(n)\)
Delete \(O(\log n)\) \(O(n)\)

Expected number of levels: \(O(\log n)\); expected space: \(O(n)\).

Advantages: Simple implementation, supports range queries, no complex rebalancing operations. Redis uses skip lists to implement sorted sets.


5. Approximation Algorithm Basics

5.1 Approximation Ratio

For a minimization problem, the approximation ratio \(\rho(n)\) of algorithm \(A\):

\[ \frac{A(I)}{OPT(I)} \leq \rho(n) \quad \forall \text{ instances } I \]

For a maximization problem:

\[ \frac{OPT(I)}{A(I)} \leq \rho(n) \]

5.2 Approximation Schemes

Type Definition Running Time
PTAS \((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\) \(n^{f(1/\varepsilon)}\)
FPTAS \((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\) \(\text{poly}(n, 1/\varepsilon)\)

FPTAS is the strongest approximation guarantee -- both polynomial time and arbitrarily close to optimal.


6. 2-Approximation for Vertex Cover

6.1 Problem

Given a graph \(G = (V, E)\), find the minimum vertex cover (every edge has at least one endpoint selected).

6.2 Greedy Matching Algorithm

def vertex_cover_2approx(graph):
    """Vertex cover 2-approximation algorithm"""
    cover = set()
    edges = set(graph.edges())

    while edges:
        # Pick any edge
        u, v = next(iter(edges))
        cover.add(u)
        cover.add(v)
        # Remove all edges incident to u or v
        edges = {(a, b) for (a, b) in edges if a != u and a != v and b != u and b != v}

    return cover

6.3 Approximation Ratio Proof

Let the algorithm select \(k\) edges forming a maximal matching \(M\).

  • Algorithm output: \(|C| = 2k\)
  • The optimal cover must cover each edge in \(M\), requiring at least \(k\) vertices
  • Therefore \(|C| = 2k \leq 2 \cdot OPT\)

Approximation ratio \(\rho = 2\). \(\square\)


7. Greedy Approximation for Set Cover

7.1 Problem

Universe \(U\), subfamily \(\{S_1, \ldots, S_m\}\); select the fewest subsets to cover \(U\).

7.2 Greedy Algorithm

def greedy_set_cover(universe, subsets):
    """Set cover greedy algorithm"""
    covered = set()
    chosen = []

    while covered != universe:
        # Select the subset covering the most uncovered elements
        best = max(subsets, key=lambda s: len(s - covered))
        chosen.append(best)
        covered |= best

    return chosen

7.3 Approximation Ratio

Theorem: The greedy algorithm has approximation ratio \(H_n = \sum_{i=1}^{n} \frac{1}{i} = O(\ln n)\), where \(n = |U|\).

Proof sketch: Use a charging argument. The cost of 1 for each greedy selection is distributed among newly covered elements; the \(k\)-th element covered is charged at most \(\frac{1}{n-k+1}\).

Optimality

Under the \(P \neq NP\) assumption, set cover cannot be approximated in polynomial time to better than \((1-\varepsilon) \ln n\) (Dinur & Steurer 2014). Thus the greedy algorithm's approximation ratio is essentially optimal.


8. Randomized Approximation for MAX-SAT

8.1 Problem

Given a CNF formula, maximize the number of satisfied clauses.

8.2 Random Assignment Algorithm

Independently set each variable to TRUE/FALSE with probability \(1/2\).

Theorem: Let the formula have \(m\) clauses, each with at least \(k\) literals. The expected number of satisfied clauses under random assignment is \(\geq (1 - 2^{-k}) \cdot m\).

Proof: Clause \(C_j\) has \(k_j\) literals; the probability that \(C_j\) is not satisfied is \(2^{-k_j}\).

\[ E[\text{satisfied clauses}] = \sum_{j=1}^{m} (1 - 2^{-k_j}) \geq \sum_{j=1}^{m} (1 - 2^{-k}) = (1 - 2^{-k})m \]

For 3-SAT (\(k=3\)): expected \(\frac{7}{8}m\) satisfied clauses, giving approximation ratio \(\frac{7}{8}\).

8.3 Derandomization

Method of conditional expectations: Determine variable values one by one, choosing at each step the assignment that does not decrease the conditional expectation. This converts the randomized algorithm into a deterministic one while maintaining the same approximation ratio.


9. FPTAS for the Knapsack Problem

9.1 Dynamic Programming

Standard DP: \(O(nW)\); this is pseudo-polynomial time (\(W\) may be exponentially large).

9.2 FPTAS Approach

  1. Scale all values: \(\hat{v}_i = \lfloor v_i \cdot \frac{n}{\varepsilon v_{\max}} \rfloor\)
  2. Run DP on the scaled values
  3. Running time: \(O(n^3 / \varepsilon)\)
  4. Approximation ratio: \((1 - \varepsilon)\)

10. Summary Comparison

Approach Applicable Scenarios Guarantees
Randomized algorithms Need simplification or speedup Expected time / probabilistic correctness
2-approximation Vertex cover, etc. Constant factor
\(O(\ln n)\)-approximation Set cover, etc. Logarithmic factor
PTAS/FPTAS Knapsack, etc. Arbitrary precision
Randomized rounding MAX-SAT, etc. Constant factor
graph LR
    NPC[NP-Complete Problems] --> A1[Exact Algorithms<br>Exponential time]
    NPC --> A2[Approximation Algorithms<br>Polynomial time + quality guarantee]
    NPC --> A3[Randomized Algorithms<br>Probabilistic guarantee]
    NPC --> A4[Heuristic Algorithms<br>No theoretical guarantee]
    A2 --> FPTAS
    A2 --> PTAS
    A2 --> CONST[Constant approx. ratio]
    A2 --> LOG[Logarithmic approx. ratio]

References

  • Motwani & Raghavan, Randomized Algorithms
  • Vazirani, Approximation Algorithms
  • Williamson & Shmoys, The Design of Approximation Algorithms

Related sections:


评论 #