Greedy Algorithms

Overview

Greedy algorithms make the locally optimal choice at each step, hoping to arrive at a globally optimal solution. Greedy algorithms are not always correct, but when a problem exhibits the greedy choice property and optimal substructure, greedy strategies can efficiently produce optimal solutions.

1. Fundamental Elements of Greedy Algorithms

1.1 Greedy Choice Property

Greedy choice property: By making locally optimal choices (greedy choices), one can arrive at a globally optimal solution. That is, there exists some optimal solution that includes the greedy choice.

1.2 Optimal Substructure

Optimal substructure: After making the greedy choice, the optimal solution to the remaining subproblem is consistent with the optimal solution to the original problem.

1.3 Proof Framework for Greedy Algorithms

Greedy choice property: Prove that there exists an optimal solution containing the first greedy choice
- Common technique: Exchange argument -- assume the optimal solution does not include the greedy choice, then show by exchange that the modified solution is no worse
Optimal substructure: Prove that after the greedy choice, the problem reduces to a smaller instance of the same type

2. Activity Selection Problem

2.1 Problem Definition

Given \(n\) activities, each with a start time \(s_i\) and finish time \(f_i\), select the maximum number of mutually non-overlapping activities.

2.2 Greedy Strategy

Sort by finish time, then iteratively select the earliest-finishing activity that does not conflict with already selected activities.

def activity_selection(activities):
    """Activity selection: greedy by finish time"""
    # Sort by finish time
    activities.sort(key=lambda x: x[1])
    selected = [activities[0]]
    last_end = activities[0][1]

    for start, end in activities[1:]:
        if start >= last_end:
            selected.append((start, end))
            last_end = end

    return selected

2.3 Correctness Proof

Greedy choice property: Let \(a_1\) be the activity with the earliest finish time. There exists an optimal solution containing \(a_1\).

Proof: Let \(O\) be an optimal solution, and \(a_k\) be the activity in \(O\) with the earliest finish time. Since \(f_1 \leq f_k\), replacing \(a_k\) with \(a_1\) does not conflict with any other activity in \(O\), so the new solution is still optimal. \(\square\)

3. Huffman Coding

3.1 Problem Definition

Given a character set with frequencies, construct a prefix-free variable-length encoding that minimizes the total encoding length.

3.2 Algorithm

import heapq

def huffman(freq):
    """Huffman coding"""
    # Build a min-heap
    heap = [(f, i, None, None) for i, f in enumerate(freq)]
    heapq.heapify(heap)

    while len(heap) > 1:
        # Extract the two nodes with lowest frequency
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        # Merge
        merged = (left[0] + right[0], -1, left, right)
        heapq.heappush(heap, merged)

    return heap[0]  # Root node

Time complexity: \(O(n \log n)\)

3.3 Optimality

Huffman coding produces the optimal prefix code.

Properties:

The two characters with the lowest frequencies are siblings (deepest leaves) in the optimal encoding tree
The average code length of Huffman coding satisfies: \(H(X) \leq \bar{L} < H(X) + 1\), where \(H(X)\) is the information entropy

graph TB
    R((100)) --> L((45))
    R --> RL((55))
    RL --> RLL((25))
    RL --> RLR((30))
    RLL --> A["a:12<br>code:110"]
    RLL --> B["b:13<br>code:111"]
    RLR --> C["c:16<br>code:10"]
    RLR --> RLRR((14))
    RLRR --> D["d:5<br>code:100"]
    RLRR --> E["e:9<br>code:101"]
    L --> F["f:45<br>code:0"]

4. Minimum Spanning Trees

4.1 Kruskal's Algorithm

Greedy strategy: Sort edges by weight in ascending order, and add each edge that does not form a cycle.

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

def kruskal(n, edges):
    """Kruskal's minimum spanning tree"""
    edges.sort(key=lambda e: e[2])  # Sort by weight
    uf = UnionFind(n)
    mst = []
    total = 0
    for u, v, w in edges:
        if uf.union(u, v):
            mst.append((u, v, w))
            total += w
            if len(mst) == n - 1:
                break
    return mst, total

Time complexity: \(O(E \log E)\) (dominated by sorting)

4.2 Prim's Algorithm

Greedy strategy: Start from any vertex, and at each step add the minimum-weight edge connecting the selected set to the unselected set.

def prim(n, adj):
    """Prim's minimum spanning tree"""
    visited = [False] * n
    min_heap = [(0, 0)]  # (weight, vertex)
    total = 0
    edges_added = 0

    while min_heap and edges_added < n:
        w, u = heapq.heappop(min_heap)
        if visited[u]:
            continue
        visited[u] = True
        total += w
        edges_added += 1
        for v, weight in adj[u]:
            if not visited[v]:
                heapq.heappush(min_heap, (weight, v))

    return total

Time complexity: \(O(E \log V)\) (binary heap), \(O(E + V \log V)\) (Fibonacci heap)

4.3 Cut Property

Theorem: For any cut \((S, V \setminus S)\) of graph \(G\), the minimum-weight edge crossing the cut belongs to some minimum spanning tree.

This is the unified proof basis for the correctness of both Kruskal's and Prim's algorithms.

5. Fractional Knapsack Problem

5.1 Problem Definition

\(n\) items, the \(i\)-th with weight \(w_i\) and value \(v_i\), knapsack capacity \(W\). Fractions of items may be taken.

5.2 Greedy Strategy

Sort by value-to-weight ratio \(v_i / w_i\) in descending order, then fill the knapsack greedily.

def fractional_knapsack(items, capacity):
    """Fractional knapsack: greedy by unit value"""
    # items: [(value, weight), ...]
    items.sort(key=lambda x: x[0] / x[1], reverse=True)
    total_value = 0

    for value, weight in items:
        if capacity >= weight:
            total_value += value
            capacity -= weight
        else:
            total_value += value * (capacity / weight)
            break

    return total_value

Time complexity: \(O(n \log n)\)

0-1 Knapsack vs Fractional Knapsack

The fractional knapsack can be solved greedily, but the 0-1 knapsack (items cannot be split) cannot. The 0-1 knapsack is NP-hard and requires dynamic programming.

6. Job Scheduling

6.1 Minimizing Completion Time (SPT Rule)

\(n\) jobs to be processed on a single machine, job \(i\) has processing time \(p_i\). Minimize the average completion time.

Greedy strategy: Sort by processing time in ascending order (Shortest Processing Time first).

Average completion time:

\[ \bar{C} = \frac{1}{n} \sum_{i=1}^{n} C_i = \frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{i} p_{\pi(j)} \]

6.2 Job Scheduling with Deadlines

\(n\) jobs, each with profit \(p_i\) and deadline \(d_i\), each requiring unit time. Select a subset of jobs to maximize total profit.

Greedy strategy: Sort by profit in descending order, and schedule each job in the latest available time slot before its deadline.

7. Matroids and Greedy Algorithms

7.1 Matroids

A matroid \(M = (S, \mathcal{I})\):

\(S\): Finite set (ground set)
\(\mathcal{I} \subseteq 2^S\): Family of independent sets

Satisfying:

Hereditary property: \(B \in \mathcal{I}, A \subseteq B \implies A \in \mathcal{I}\)
Exchange property: \(A, B \in \mathcal{I}, |A| < |B| \implies \exists x \in B \setminus A, A \cup \{x\} \in \mathcal{I}\)

7.2 Greedy Algorithms on Matroids

Theorem: For a weighted matroid (independent set family with weights), the greedy algorithm (add elements in decreasing weight order, maintaining independence) produces the maximum-weight independent set.

Examples:

Graphic matroid (edge set, independent sets = acyclic edge sets) \(\to\) Kruskal's algorithm
Uniform matroid \(\to\) Select the top \(k\) largest elements
Partition matroid \(\to\) Coloring problems

8. Examples Where Greedy Fails

8.1 0-1 Knapsack

Items: \(v_1=60, w_1=10\); \(v_2=100, w_2=20\); \(v_3=120, w_3=30\). Capacity \(W=50\).

Greedy (by \(v/w\)): Select 1, 2 \(\to\) total value 160
Optimal: Select 2, 3 \(\to\) total value 220

8.2 Set Cover (Approximation)

Greedy does not yield the optimal solution, but achieves an \(O(\ln n)\) approximation ratio, which is optimal (under the \(P \neq NP\) assumption).

8.3 Coin Change

Denominations \(\{1, 3, 4\}\), target \(6\):

Greedy: \(4 + 1 + 1 = 3\) coins
Optimal: \(3 + 3 = 2\) coins

When to Use Greedy

When clear greedy choice property and optimal substructure exist
When the problem can be modeled as a matroid
As a heuristic to obtain approximate solutions
When uncertain, try greedy first, then verify with counterexamples

References

Cormen et al., Introduction to Algorithms (CLRS), Chapter 16
Kleinberg & Tardos, Algorithm Design, Chapter 4

Related sections: