Randomized and Approximation Algorithms

Overview

Randomized algorithms leverage randomness to simplify design, improve efficiency, or handle uncertainty. Approximation algorithms sacrifice exact optimality to provide solutions with quality guarantees in polynomial time. Both are important tools for tackling hard problems.

1. Classification of Randomized Algorithms

1.1 Las Vegas Algorithms

Results are always correct
Running time is a random variable
Expected running time is guaranteed

Example: Randomized quicksort

1.2 Monte Carlo Algorithms

Running time is deterministic (or has an upper bound)
Results may be incorrect, but the error probability is bounded

Example: Miller-Rabin primality test

graph TD
    RA[Randomized Algorithms] --> LV[Las Vegas]
    RA --> MC[Monte Carlo]
    LV --> LV1[Result always correct]
    LV --> LV2[Time is random]
    MC --> MC1[Time is deterministic]
    MC --> MC2[Result may be incorrect]
    MC --> MC3[Error probability reducible by repetition]

2. Randomized Quicksort

2.1 Algorithm

Randomly select the pivot element:

import random

def randomized_quicksort(arr, lo, hi):
    if lo < hi:
        # Randomly select pivot
        pivot_idx = random.randint(lo, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        p = partition(arr, lo, hi)
        randomized_quicksort(arr, lo, p - 1)
        randomized_quicksort(arr, p + 1, hi)

2.2 Expected Time Analysis

Theorem: The expected number of comparisons in randomized quicksort is \(O(n \log n)\).

Proof: Let the sorted elements be \(z_1 < z_2 < \cdots < z_n\). Define indicator random variables:

\[ X_{ij} = \begin{cases} 1 & \text{if } z_i \text{ and } z_j \text{ are compared} \\ 0 & \text{otherwise} \end{cases} \]

Total comparisons:

\[ X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} X_{ij} \]

\(z_i\) and \(z_j\) are compared if and only if the first element chosen as pivot from \(\{z_i, z_{i+1}, \ldots, z_j\}\) is either \(z_i\) or \(z_j\), which occurs with probability \(\frac{2}{j - i + 1}\).

\[ E[X] = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \frac{2}{j - i + 1} = \sum_{i=1}^{n-1} \sum_{k=2}^{n-i+1} \frac{2}{k} \leq 2n \sum_{k=2}^{n} \frac{1}{k} = O(n \log n) \]

3. Bloom Filters

3.1 Principle

A Bloom filter is a probabilistic data structure that supports efficient set membership queries.

Structure: A bit array of length \(m\) + \(k\) independent hash functions
Insert: Set the \(k\) hashed positions to 1
Query: Check whether all \(k\) positions are 1

3.2 Error Rate

False positive probability (after inserting \(n\) elements):

\[ p \approx \left(1 - e^{-kn/m}\right)^k \]

Optimal number of hash functions:

\[ k^* = \frac{m}{n} \ln 2 \]

In this case, \(p \approx (0.6185)^{m/n}\).

class BloomFilter:
    def __init__(self, m, k, hash_funcs):
        self.bits = [False] * m
        self.m = m
        self.hash_funcs = hash_funcs[:k]

    def insert(self, item):
        for h in self.hash_funcs:
            self.bits[h(item) % self.m] = True

    def query(self, item):
        return all(self.bits[h(item) % self.m] for h in self.hash_funcs)

Characteristics:

No false negatives (if it says not present, it is definitely not present)
False positives possible (may falsely report presence)
Extremely space-efficient
Does not support deletion (use Counting Bloom Filter for that)

4. Skip Lists

4.1 Structure

A skip list is a randomized data structure where each node is promoted to the next level with probability \(p = 1/2\).

Level 3:  1 -----------------> 9
Level 2:  1 -------> 5 ------> 9
Level 1:  1 --> 3 -> 5 -> 7 -> 9
Level 0:  1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9

4.2 Performance

Operation	Expected Time	Worst-Case Time
Search	\(O(\log n)\)	\(O(n)\)
Insert	\(O(\log n)\)	\(O(n)\)
Delete	\(O(\log n)\)	\(O(n)\)

Expected number of levels: \(O(\log n)\); expected space: \(O(n)\).

Advantages: Simple implementation, supports range queries, no complex rebalancing operations. Redis uses skip lists to implement sorted sets.

5. Approximation Algorithm Basics

5.1 Approximation Ratio

For a minimization problem, the approximation ratio \(\rho(n)\) of algorithm \(A\):

\[ \frac{A(I)}{OPT(I)} \leq \rho(n) \quad \forall \text{ instances } I \]

For a maximization problem:

\[ \frac{OPT(I)}{A(I)} \leq \rho(n) \]

5.2 Approximation Schemes

Type	Definition	Running Time
PTAS	\((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\)	\(n^{f(1/\varepsilon)}\)
FPTAS	\((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\)	\(\text{poly}(n, 1/\varepsilon)\)

FPTAS is the strongest approximation guarantee -- both polynomial time and arbitrarily close to optimal.

6. 2-Approximation for Vertex Cover

6.1 Problem

Given a graph \(G = (V, E)\), find the minimum vertex cover (every edge has at least one endpoint selected).

6.2 Greedy Matching Algorithm

def vertex_cover_2approx(graph):
    """Vertex cover 2-approximation algorithm"""
    cover = set()
    edges = set(graph.edges())

    while edges:
        # Pick any edge
        u, v = next(iter(edges))
        cover.add(u)
        cover.add(v)
        # Remove all edges incident to u or v
        edges = {(a, b) for (a, b) in edges if a != u and a != v and b != u and b != v}

    return cover

6.3 Approximation Ratio Proof

Let the algorithm select \(k\) edges forming a maximal matching \(M\).

Algorithm output: \(|C| = 2k\)
The optimal cover must cover each edge in \(M\), requiring at least \(k\) vertices
Therefore \(|C| = 2k \leq 2 \cdot OPT\)

Approximation ratio \(\rho = 2\). \(\square\)

7. Greedy Approximation for Set Cover

7.1 Problem

Universe \(U\), subfamily \(\{S_1, \ldots, S_m\}\); select the fewest subsets to cover \(U\).

7.2 Greedy Algorithm

def greedy_set_cover(universe, subsets):
    """Set cover greedy algorithm"""
    covered = set()
    chosen = []

    while covered != universe:
        # Select the subset covering the most uncovered elements
        best = max(subsets, key=lambda s: len(s - covered))
        chosen.append(best)
        covered |= best

    return chosen

7.3 Approximation Ratio

Theorem: The greedy algorithm has approximation ratio \(H_n = \sum_{i=1}^{n} \frac{1}{i} = O(\ln n)\), where \(n = |U|\).

Proof sketch: Use a charging argument. The cost of 1 for each greedy selection is distributed among newly covered elements; the \(k\)-th element covered is charged at most \(\frac{1}{n-k+1}\).

Optimality

Under the \(P \neq NP\) assumption, set cover cannot be approximated in polynomial time to better than \((1-\varepsilon) \ln n\) (Dinur & Steurer 2014). Thus the greedy algorithm's approximation ratio is essentially optimal.

8. Randomized Approximation for MAX-SAT

8.1 Problem

Given a CNF formula, maximize the number of satisfied clauses.

8.2 Random Assignment Algorithm

Independently set each variable to TRUE/FALSE with probability \(1/2\).

Theorem: Let the formula have \(m\) clauses, each with at least \(k\) literals. The expected number of satisfied clauses under random assignment is \(\geq (1 - 2^{-k}) \cdot m\).

Proof: Clause \(C_j\) has \(k_j\) literals; the probability that \(C_j\) is not satisfied is \(2^{-k_j}\).

\[ E[\text{satisfied clauses}] = \sum_{j=1}^{m} (1 - 2^{-k_j}) \geq \sum_{j=1}^{m} (1 - 2^{-k}) = (1 - 2^{-k})m \]

For 3-SAT (\(k=3\)): expected \(\frac{7}{8}m\) satisfied clauses, giving approximation ratio \(\frac{7}{8}\).

8.3 Derandomization

Method of conditional expectations: Determine variable values one by one, choosing at each step the assignment that does not decrease the conditional expectation. This converts the randomized algorithm into a deterministic one while maintaining the same approximation ratio.

9. FPTAS for the Knapsack Problem

9.1 Dynamic Programming

Standard DP: \(O(nW)\); this is pseudo-polynomial time (\(W\) may be exponentially large).

9.2 FPTAS Approach

Scale all values: \(\hat{v}_i = \lfloor v_i \cdot \frac{n}{\varepsilon v_{\max}} \rfloor\)
Run DP on the scaled values
Running time: \(O(n^3 / \varepsilon)\)
Approximation ratio: \((1 - \varepsilon)\)

10. Summary Comparison

Approach	Applicable Scenarios	Guarantees
Randomized algorithms	Need simplification or speedup	Expected time / probabilistic correctness
2-approximation	Vertex cover, etc.	Constant factor
\(O(\ln n)\)-approximation	Set cover, etc.	Logarithmic factor
PTAS/FPTAS	Knapsack, etc.	Arbitrary precision
Randomized rounding	MAX-SAT, etc.	Constant factor

graph LR
    NPC[NP-Complete Problems] --> A1[Exact Algorithms<br>Exponential time]
    NPC --> A2[Approximation Algorithms<br>Polynomial time + quality guarantee]
    NPC --> A3[Randomized Algorithms<br>Probabilistic guarantee]
    NPC --> A4[Heuristic Algorithms<br>No theoretical guarantee]
    A2 --> FPTAS
    A2 --> PTAS
    A2 --> CONST[Constant approx. ratio]
    A2 --> LOG[Logarithmic approx. ratio]

References

Motwani & Raghavan, Randomized Algorithms
Vazirani, Approximation Algorithms
Williamson & Shmoys, The Design of Approximation Algorithms

Related sections: