Randomized and Approximation Algorithms
Overview
Randomized algorithms leverage randomness to simplify design, improve efficiency, or handle uncertainty. Approximation algorithms sacrifice exact optimality to provide solutions with quality guarantees in polynomial time. Both are important tools for tackling hard problems.
1. Classification of Randomized Algorithms
1.1 Las Vegas Algorithms
- Results are always correct
- Running time is a random variable
- Expected running time is guaranteed
Example: Randomized quicksort
1.2 Monte Carlo Algorithms
- Running time is deterministic (or has an upper bound)
- Results may be incorrect, but the error probability is bounded
Example: Miller-Rabin primality test
graph TD
RA[Randomized Algorithms] --> LV[Las Vegas]
RA --> MC[Monte Carlo]
LV --> LV1[Result always correct]
LV --> LV2[Time is random]
MC --> MC1[Time is deterministic]
MC --> MC2[Result may be incorrect]
MC --> MC3[Error probability reducible by repetition]
2. Randomized Quicksort
2.1 Algorithm
Randomly select the pivot element:
import random
def randomized_quicksort(arr, lo, hi):
if lo < hi:
# Randomly select pivot
pivot_idx = random.randint(lo, hi)
arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
p = partition(arr, lo, hi)
randomized_quicksort(arr, lo, p - 1)
randomized_quicksort(arr, p + 1, hi)
2.2 Expected Time Analysis
Theorem: The expected number of comparisons in randomized quicksort is \(O(n \log n)\).
Proof: Let the sorted elements be \(z_1 < z_2 < \cdots < z_n\). Define indicator random variables:
Total comparisons:
\(z_i\) and \(z_j\) are compared if and only if the first element chosen as pivot from \(\{z_i, z_{i+1}, \ldots, z_j\}\) is either \(z_i\) or \(z_j\), which occurs with probability \(\frac{2}{j - i + 1}\).
3. Bloom Filters
3.1 Principle
A Bloom filter is a probabilistic data structure that supports efficient set membership queries.
- Structure: A bit array of length \(m\) + \(k\) independent hash functions
- Insert: Set the \(k\) hashed positions to 1
- Query: Check whether all \(k\) positions are 1
3.2 Error Rate
False positive probability (after inserting \(n\) elements):
Optimal number of hash functions:
In this case, \(p \approx (0.6185)^{m/n}\).
class BloomFilter:
def __init__(self, m, k, hash_funcs):
self.bits = [False] * m
self.m = m
self.hash_funcs = hash_funcs[:k]
def insert(self, item):
for h in self.hash_funcs:
self.bits[h(item) % self.m] = True
def query(self, item):
return all(self.bits[h(item) % self.m] for h in self.hash_funcs)
Characteristics:
- No false negatives (if it says not present, it is definitely not present)
- False positives possible (may falsely report presence)
- Extremely space-efficient
- Does not support deletion (use Counting Bloom Filter for that)
4. Skip Lists
4.1 Structure
A skip list is a randomized data structure where each node is promoted to the next level with probability \(p = 1/2\).
Level 3: 1 -----------------> 9
Level 2: 1 -------> 5 ------> 9
Level 1: 1 --> 3 -> 5 -> 7 -> 9
Level 0: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9
4.2 Performance
| Operation | Expected Time | Worst-Case Time |
|---|---|---|
| Search | \(O(\log n)\) | \(O(n)\) |
| Insert | \(O(\log n)\) | \(O(n)\) |
| Delete | \(O(\log n)\) | \(O(n)\) |
Expected number of levels: \(O(\log n)\); expected space: \(O(n)\).
Advantages: Simple implementation, supports range queries, no complex rebalancing operations. Redis uses skip lists to implement sorted sets.
5. Approximation Algorithm Basics
5.1 Approximation Ratio
For a minimization problem, the approximation ratio \(\rho(n)\) of algorithm \(A\):
For a maximization problem:
5.2 Approximation Schemes
| Type | Definition | Running Time |
|---|---|---|
| PTAS | \((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\) | \(n^{f(1/\varepsilon)}\) |
| FPTAS | \((1+\varepsilon)\)-approximation for any \(\varepsilon > 0\) | \(\text{poly}(n, 1/\varepsilon)\) |
FPTAS is the strongest approximation guarantee -- both polynomial time and arbitrarily close to optimal.
6. 2-Approximation for Vertex Cover
6.1 Problem
Given a graph \(G = (V, E)\), find the minimum vertex cover (every edge has at least one endpoint selected).
6.2 Greedy Matching Algorithm
def vertex_cover_2approx(graph):
"""Vertex cover 2-approximation algorithm"""
cover = set()
edges = set(graph.edges())
while edges:
# Pick any edge
u, v = next(iter(edges))
cover.add(u)
cover.add(v)
# Remove all edges incident to u or v
edges = {(a, b) for (a, b) in edges if a != u and a != v and b != u and b != v}
return cover
6.3 Approximation Ratio Proof
Let the algorithm select \(k\) edges forming a maximal matching \(M\).
- Algorithm output: \(|C| = 2k\)
- The optimal cover must cover each edge in \(M\), requiring at least \(k\) vertices
- Therefore \(|C| = 2k \leq 2 \cdot OPT\)
Approximation ratio \(\rho = 2\). \(\square\)
7. Greedy Approximation for Set Cover
7.1 Problem
Universe \(U\), subfamily \(\{S_1, \ldots, S_m\}\); select the fewest subsets to cover \(U\).
7.2 Greedy Algorithm
def greedy_set_cover(universe, subsets):
"""Set cover greedy algorithm"""
covered = set()
chosen = []
while covered != universe:
# Select the subset covering the most uncovered elements
best = max(subsets, key=lambda s: len(s - covered))
chosen.append(best)
covered |= best
return chosen
7.3 Approximation Ratio
Theorem: The greedy algorithm has approximation ratio \(H_n = \sum_{i=1}^{n} \frac{1}{i} = O(\ln n)\), where \(n = |U|\).
Proof sketch: Use a charging argument. The cost of 1 for each greedy selection is distributed among newly covered elements; the \(k\)-th element covered is charged at most \(\frac{1}{n-k+1}\).
Optimality
Under the \(P \neq NP\) assumption, set cover cannot be approximated in polynomial time to better than \((1-\varepsilon) \ln n\) (Dinur & Steurer 2014). Thus the greedy algorithm's approximation ratio is essentially optimal.
8. Randomized Approximation for MAX-SAT
8.1 Problem
Given a CNF formula, maximize the number of satisfied clauses.
8.2 Random Assignment Algorithm
Independently set each variable to TRUE/FALSE with probability \(1/2\).
Theorem: Let the formula have \(m\) clauses, each with at least \(k\) literals. The expected number of satisfied clauses under random assignment is \(\geq (1 - 2^{-k}) \cdot m\).
Proof: Clause \(C_j\) has \(k_j\) literals; the probability that \(C_j\) is not satisfied is \(2^{-k_j}\).
For 3-SAT (\(k=3\)): expected \(\frac{7}{8}m\) satisfied clauses, giving approximation ratio \(\frac{7}{8}\).
8.3 Derandomization
Method of conditional expectations: Determine variable values one by one, choosing at each step the assignment that does not decrease the conditional expectation. This converts the randomized algorithm into a deterministic one while maintaining the same approximation ratio.
9. FPTAS for the Knapsack Problem
9.1 Dynamic Programming
Standard DP: \(O(nW)\); this is pseudo-polynomial time (\(W\) may be exponentially large).
9.2 FPTAS Approach
- Scale all values: \(\hat{v}_i = \lfloor v_i \cdot \frac{n}{\varepsilon v_{\max}} \rfloor\)
- Run DP on the scaled values
- Running time: \(O(n^3 / \varepsilon)\)
- Approximation ratio: \((1 - \varepsilon)\)
10. Summary Comparison
| Approach | Applicable Scenarios | Guarantees |
|---|---|---|
| Randomized algorithms | Need simplification or speedup | Expected time / probabilistic correctness |
| 2-approximation | Vertex cover, etc. | Constant factor |
| \(O(\ln n)\)-approximation | Set cover, etc. | Logarithmic factor |
| PTAS/FPTAS | Knapsack, etc. | Arbitrary precision |
| Randomized rounding | MAX-SAT, etc. | Constant factor |
graph LR
NPC[NP-Complete Problems] --> A1[Exact Algorithms<br>Exponential time]
NPC --> A2[Approximation Algorithms<br>Polynomial time + quality guarantee]
NPC --> A3[Randomized Algorithms<br>Probabilistic guarantee]
NPC --> A4[Heuristic Algorithms<br>No theoretical guarantee]
A2 --> FPTAS
A2 --> PTAS
A2 --> CONST[Constant approx. ratio]
A2 --> LOG[Logarithmic approx. ratio]
References
- Motwani & Raghavan, Randomized Algorithms
- Vazirani, Approximation Algorithms
- Williamson & Shmoys, The Design of Approximation Algorithms
Related sections: