Skip to content

Sorting and Searching

Overview

Sorting and searching are the most fundamental algorithmic problems. Sorting organizes unordered data into ordered sequences, while searching efficiently locates target elements within datasets. Understanding the characteristics of various sorting algorithms is an essential foundation for algorithm study.


1. Comparison-Based Sorting

1.1 Lower Bound Theorem

Theorem: Any comparison-based sorting algorithm requires at least \(\Omega(n \log n)\) comparisons in the worst case.

Proof sketch: \(n\) elements have \(n!\) permutations. The decision tree needs at least \(\lceil \log_2(n!) \rceil\) leaves. By Stirling's approximation:

\[ \log_2(n!) = n \log_2 n - n \log_2 e + O(\log n) = \Omega(n \log n) \]

1.2 Bubble Sort

Repeatedly swap adjacent elements that are in the wrong order.

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        swapped = False
        for j in range(n - 1 - i):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:
            break
    return arr
  • Best: \(O(n)\) (already sorted, early termination)
  • Worst/Average: \(O(n^2)\)
  • Stable sort, in-place sort

1.3 Selection Sort

Each pass selects the smallest element from the unsorted portion and places it at the end of the sorted portion.

def selection_sort(arr):
    n = len(arr)
    for i in range(n):
        min_idx = i
        for j in range(i + 1, n):
            if arr[j] < arr[min_idx]:
                min_idx = j
        arr[i], arr[min_idx] = arr[min_idx], arr[i]
    return arr
  • All cases: \(O(n^2)\)
  • Unstable sort, in-place sort
  • Minimum number of swaps: \(O(n)\)

1.4 Insertion Sort

Insert each element into its correct position within the sorted portion.

def insertion_sort(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key
    return arr
  • Best: \(O(n)\) (already sorted)
  • Worst/Average: \(O(n^2)\)
  • Stable sort, in-place sort
  • Very efficient for small datasets or nearly sorted data

1.5 Merge Sort

Divide-and-conquer strategy: Recursively split the array in half, sort each half, then merge.

def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Recurrence: \(T(n) = 2T(n/2) + O(n)\)

  • All cases: \(O(n \log n)\)
  • Stable sort
  • Extra space: \(O(n)\)

1.6 Quick Sort

Divide-and-conquer strategy: Choose a pivot, partition the array into elements less than and greater than the pivot.

def quicksort(arr, lo, hi):
    if lo < hi:
        p = partition(arr, lo, hi)
        quicksort(arr, lo, p - 1)
        quicksort(arr, p + 1, hi)

def partition(arr, lo, hi):
    pivot = arr[hi]
    i = lo
    for j in range(lo, hi):
        if arr[j] < pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1
    arr[i], arr[hi] = arr[hi], arr[i]
    return i
  • Average: \(O(n \log n)\)
  • Worst: \(O(n^2)\) (sorted array with first/last element as pivot)
  • Unstable sort, in-place sort
  • Often the fastest general-purpose sorting algorithm in practice

Randomized quicksort: Randomly select the pivot; expected time is \(O(n \log n)\).

1.7 Heap Sort

Uses the max-heap data structure.

def heapsort(arr):
    n = len(arr)
    # Build heap
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)
    # Extract maximum elements one by one
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]
        heapify(arr, i, 0)

def heapify(arr, n, i):
    largest = i
    left, right = 2 * i + 1, 2 * i + 2
    if left < n and arr[left] > arr[largest]:
        largest = left
    if right < n and arr[right] > arr[largest]:
        largest = right
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        heapify(arr, n, largest)
  • All cases: \(O(n \log n)\)
  • Unstable sort, in-place sort
  • Heap construction: \(O(n)\)

2. Non-Comparison Sorting

2.1 Counting Sort

Suitable for integers with a limited range.

def counting_sort(arr, max_val):
    count = [0] * (max_val + 1)
    for x in arr:
        count[x] += 1
    result = []
    for val, cnt in enumerate(count):
        result.extend([val] * cnt)
    return result
  • Time: \(O(n + k)\), where \(k\) is the range
  • Space: \(O(n + k)\)
  • Stable sort

2.2 Radix Sort

Sort digit by digit from the least significant to the most significant, using a stable sort at each step.

def radix_sort(arr):
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort_by_digit(arr, exp)
        exp *= 10
  • Time: \(O(d \cdot (n + b))\), where \(d\) is the number of digits and \(b\) is the base
  • Space: \(O(n + b)\)
  • Stable sort

2.3 Bucket Sort

Distribute elements into several buckets, sort within each bucket, then concatenate.

  • Average time: \(O(n + k)\) (for uniformly distributed data)
  • Worst: \(O(n^2)\) (all elements in one bucket)
  • Suitable for sorting uniformly distributed floating-point numbers

3. Sorting Algorithm Comparison

Algorithm Best Average Worst Space Stable In-Place
Bubble Sort \(O(n)\) \(O(n^2)\) \(O(n^2)\) \(O(1)\) Yes Yes
Selection Sort \(O(n^2)\) \(O(n^2)\) \(O(n^2)\) \(O(1)\) No Yes
Insertion Sort \(O(n)\) \(O(n^2)\) \(O(n^2)\) \(O(1)\) Yes Yes
Merge Sort \(O(n\log n)\) \(O(n\log n)\) \(O(n\log n)\) \(O(n)\) Yes No
Quick Sort \(O(n\log n)\) \(O(n\log n)\) \(O(n^2)\) \(O(\log n)\) No Yes
Heap Sort \(O(n\log n)\) \(O(n\log n)\) \(O(n\log n)\) \(O(1)\) No Yes
Counting Sort \(O(n+k)\) \(O(n+k)\) \(O(n+k)\) \(O(n+k)\) Yes No
Radix Sort \(O(dn)\) \(O(dn)\) \(O(dn)\) \(O(n+b)\) Yes No
Bucket Sort \(O(n+k)\) \(O(n+k)\) \(O(n^2)\) \(O(n+k)\) Yes No
graph TD
    Q1{Large dataset?} -->|No| INS[Insertion Sort]
    Q1 -->|Yes| Q2{Limited range?}
    Q2 -->|Yes| COUNT[Counting/Radix Sort]
    Q2 -->|No| Q3{Stability needed?}
    Q3 -->|Yes| MERGE[Merge Sort]
    Q3 -->|No| Q4{Memory constrained?}
    Q4 -->|Yes| HEAP[Heap Sort]
    Q4 -->|No| QUICK[Quick Sort]

Search for a target value in a sorted array.

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2  # Prevent overflow
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1
  • Time: \(O(\log n)\)
  • Space: \(O(1)\)

4.2 Binary Search Variants

Find the first position equal to target:

def lower_bound(arr, target):
    lo, hi = 0, len(arr)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid
    return lo

Find the first position greater than target:

def upper_bound(arr, target):
    lo, hi = 0, len(arr)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if arr[mid] <= target:
            lo = mid + 1
        else:
            hi = mid
    return lo

Search in a rotated sorted array:

def search_rotated(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if arr[mid] == target:
            return mid
        if arr[lo] <= arr[mid]:  # Left half is sorted
            if arr[lo] <= target < arr[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:  # Right half is sorted
            if arr[mid] < target <= arr[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1
Application Description
Sorted array lookup Basic application
Square root computation Binary approximation of \(\sqrt{x}\)
Minimize the maximum Binary search on the answer
Allocation problems Binary search + greedy verification
Peak finding Mountain array

Binary Search Thinking Framework

Binary search applies not only to "finding an element in a sorted array" but more broadly to any decision problem with monotonicity: if the answer \(x\) is feasible, then \(x+1\) is also feasible (or vice versa). This "binary search on the answer" paradigm is very powerful.


5. Other Search Algorithms

For uniformly distributed data, estimate the target position:

\[ \text{pos} = lo + \frac{(target - arr[lo]) \cdot (hi - lo)}{arr[hi] - arr[lo]} \]
  • Average: \(O(\log \log n)\) (uniform distribution)
  • Worst: \(O(n)\)

First grow the range exponentially, then binary search within it:

def exponential_search(arr, target):
    if arr[0] == target:
        return 0
    bound = 1
    while bound < len(arr) and arr[bound] <= target:
        bound *= 2
    lo = bound // 2
    hi = min(bound, len(arr) - 1)
    return binary_search_range(arr, target, lo, hi)
  • Time: \(O(\log n)\)
  • Suitable for unbounded or very large sorted sequences

6. Selection Algorithms

6.1 The \(k\)-th Smallest Element

QuickSelect:

def quickselect(arr, lo, hi, k):
    if lo == hi:
        return arr[lo]
    p = partition(arr, lo, hi)
    if k == p:
        return arr[p]
    elif k < p:
        return quickselect(arr, lo, p - 1, k)
    else:
        return quickselect(arr, p + 1, hi, k)
  • Average: \(O(n)\)
  • Worst: \(O(n^2)\)

Median of Medians:

  • Divide the array into \(\lceil n/5 \rceil\) groups of 5
  • Recursively find the median of medians as the pivot
  • Guarantees worst-case \(O(n)\)

References

  • Cormen et al., Introduction to Algorithms (CLRS), Chapters 2, 6-9
  • Sedgewick, Algorithms

Related sections:


评论 #