Sorting and Searching
Overview
Sorting and searching are the most fundamental algorithmic problems. Sorting organizes unordered data into ordered sequences, while searching efficiently locates target elements within datasets. Understanding the characteristics of various sorting algorithms is an essential foundation for algorithm study.
1. Comparison-Based Sorting
1.1 Lower Bound Theorem
Theorem: Any comparison-based sorting algorithm requires at least \(\Omega(n \log n)\) comparisons in the worst case.
Proof sketch: \(n\) elements have \(n!\) permutations. The decision tree needs at least \(\lceil \log_2(n!) \rceil\) leaves. By Stirling's approximation:
1.2 Bubble Sort
Repeatedly swap adjacent elements that are in the wrong order.
def bubble_sort(arr):
n = len(arr)
for i in range(n):
swapped = False
for j in range(n - 1 - i):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped:
break
return arr
- Best: \(O(n)\) (already sorted, early termination)
- Worst/Average: \(O(n^2)\)
- Stable sort, in-place sort
1.3 Selection Sort
Each pass selects the smallest element from the unsorted portion and places it at the end of the sorted portion.
def selection_sort(arr):
n = len(arr)
for i in range(n):
min_idx = i
for j in range(i + 1, n):
if arr[j] < arr[min_idx]:
min_idx = j
arr[i], arr[min_idx] = arr[min_idx], arr[i]
return arr
- All cases: \(O(n^2)\)
- Unstable sort, in-place sort
- Minimum number of swaps: \(O(n)\)
1.4 Insertion Sort
Insert each element into its correct position within the sorted portion.
def insertion_sort(arr):
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
- Best: \(O(n)\) (already sorted)
- Worst/Average: \(O(n^2)\)
- Stable sort, in-place sort
- Very efficient for small datasets or nearly sorted data
1.5 Merge Sort
Divide-and-conquer strategy: Recursively split the array in half, sort each half, then merge.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
Recurrence: \(T(n) = 2T(n/2) + O(n)\)
- All cases: \(O(n \log n)\)
- Stable sort
- Extra space: \(O(n)\)
1.6 Quick Sort
Divide-and-conquer strategy: Choose a pivot, partition the array into elements less than and greater than the pivot.
def quicksort(arr, lo, hi):
if lo < hi:
p = partition(arr, lo, hi)
quicksort(arr, lo, p - 1)
quicksort(arr, p + 1, hi)
def partition(arr, lo, hi):
pivot = arr[hi]
i = lo
for j in range(lo, hi):
if arr[j] < pivot:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[i], arr[hi] = arr[hi], arr[i]
return i
- Average: \(O(n \log n)\)
- Worst: \(O(n^2)\) (sorted array with first/last element as pivot)
- Unstable sort, in-place sort
- Often the fastest general-purpose sorting algorithm in practice
Randomized quicksort: Randomly select the pivot; expected time is \(O(n \log n)\).
1.7 Heap Sort
Uses the max-heap data structure.
def heapsort(arr):
n = len(arr)
# Build heap
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# Extract maximum elements one by one
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
heapify(arr, i, 0)
def heapify(arr, n, i):
largest = i
left, right = 2 * i + 1, 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
- All cases: \(O(n \log n)\)
- Unstable sort, in-place sort
- Heap construction: \(O(n)\)
2. Non-Comparison Sorting
2.1 Counting Sort
Suitable for integers with a limited range.
def counting_sort(arr, max_val):
count = [0] * (max_val + 1)
for x in arr:
count[x] += 1
result = []
for val, cnt in enumerate(count):
result.extend([val] * cnt)
return result
- Time: \(O(n + k)\), where \(k\) is the range
- Space: \(O(n + k)\)
- Stable sort
2.2 Radix Sort
Sort digit by digit from the least significant to the most significant, using a stable sort at each step.
def radix_sort(arr):
max_val = max(arr)
exp = 1
while max_val // exp > 0:
counting_sort_by_digit(arr, exp)
exp *= 10
- Time: \(O(d \cdot (n + b))\), where \(d\) is the number of digits and \(b\) is the base
- Space: \(O(n + b)\)
- Stable sort
2.3 Bucket Sort
Distribute elements into several buckets, sort within each bucket, then concatenate.
- Average time: \(O(n + k)\) (for uniformly distributed data)
- Worst: \(O(n^2)\) (all elements in one bucket)
- Suitable for sorting uniformly distributed floating-point numbers
3. Sorting Algorithm Comparison
| Algorithm | Best | Average | Worst | Space | Stable | In-Place |
|---|---|---|---|---|---|---|
| Bubble Sort | \(O(n)\) | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) | Yes | Yes |
| Selection Sort | \(O(n^2)\) | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) | No | Yes |
| Insertion Sort | \(O(n)\) | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) | Yes | Yes |
| Merge Sort | \(O(n\log n)\) | \(O(n\log n)\) | \(O(n\log n)\) | \(O(n)\) | Yes | No |
| Quick Sort | \(O(n\log n)\) | \(O(n\log n)\) | \(O(n^2)\) | \(O(\log n)\) | No | Yes |
| Heap Sort | \(O(n\log n)\) | \(O(n\log n)\) | \(O(n\log n)\) | \(O(1)\) | No | Yes |
| Counting Sort | \(O(n+k)\) | \(O(n+k)\) | \(O(n+k)\) | \(O(n+k)\) | Yes | No |
| Radix Sort | \(O(dn)\) | \(O(dn)\) | \(O(dn)\) | \(O(n+b)\) | Yes | No |
| Bucket Sort | \(O(n+k)\) | \(O(n+k)\) | \(O(n^2)\) | \(O(n+k)\) | Yes | No |
graph TD
Q1{Large dataset?} -->|No| INS[Insertion Sort]
Q1 -->|Yes| Q2{Limited range?}
Q2 -->|Yes| COUNT[Counting/Radix Sort]
Q2 -->|No| Q3{Stability needed?}
Q3 -->|Yes| MERGE[Merge Sort]
Q3 -->|No| Q4{Memory constrained?}
Q4 -->|Yes| HEAP[Heap Sort]
Q4 -->|No| QUICK[Quick Sort]
4. Binary Search
4.1 Basic Binary Search
Search for a target value in a sorted array.
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2 # Prevent overflow
if arr[mid] == target:
return mid
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1
- Time: \(O(\log n)\)
- Space: \(O(1)\)
4.2 Binary Search Variants
Find the first position equal to target:
def lower_bound(arr, target):
lo, hi = 0, len(arr)
while lo < hi:
mid = lo + (hi - lo) // 2
if arr[mid] < target:
lo = mid + 1
else:
hi = mid
return lo
Find the first position greater than target:
def upper_bound(arr, target):
lo, hi = 0, len(arr)
while lo < hi:
mid = lo + (hi - lo) // 2
if arr[mid] <= target:
lo = mid + 1
else:
hi = mid
return lo
Search in a rotated sorted array:
def search_rotated(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2
if arr[mid] == target:
return mid
if arr[lo] <= arr[mid]: # Left half is sorted
if arr[lo] <= target < arr[mid]:
hi = mid - 1
else:
lo = mid + 1
else: # Right half is sorted
if arr[mid] < target <= arr[hi]:
lo = mid + 1
else:
hi = mid - 1
return -1
4.3 Applications of Binary Search
| Application | Description |
|---|---|
| Sorted array lookup | Basic application |
| Square root computation | Binary approximation of \(\sqrt{x}\) |
| Minimize the maximum | Binary search on the answer |
| Allocation problems | Binary search + greedy verification |
| Peak finding | Mountain array |
Binary Search Thinking Framework
Binary search applies not only to "finding an element in a sorted array" but more broadly to any decision problem with monotonicity: if the answer \(x\) is feasible, then \(x+1\) is also feasible (or vice versa). This "binary search on the answer" paradigm is very powerful.
5. Other Search Algorithms
5.1 Interpolation Search
For uniformly distributed data, estimate the target position:
- Average: \(O(\log \log n)\) (uniform distribution)
- Worst: \(O(n)\)
5.2 Exponential Search
First grow the range exponentially, then binary search within it:
def exponential_search(arr, target):
if arr[0] == target:
return 0
bound = 1
while bound < len(arr) and arr[bound] <= target:
bound *= 2
lo = bound // 2
hi = min(bound, len(arr) - 1)
return binary_search_range(arr, target, lo, hi)
- Time: \(O(\log n)\)
- Suitable for unbounded or very large sorted sequences
6. Selection Algorithms
6.1 The \(k\)-th Smallest Element
QuickSelect:
def quickselect(arr, lo, hi, k):
if lo == hi:
return arr[lo]
p = partition(arr, lo, hi)
if k == p:
return arr[p]
elif k < p:
return quickselect(arr, lo, p - 1, k)
else:
return quickselect(arr, p + 1, hi, k)
- Average: \(O(n)\)
- Worst: \(O(n^2)\)
Median of Medians:
- Divide the array into \(\lceil n/5 \rceil\) groups of 5
- Recursively find the median of medians as the pivot
- Guarantees worst-case \(O(n)\)
References
- Cormen et al., Introduction to Algorithms (CLRS), Chapters 2, 6-9
- Sedgewick, Algorithms
Related sections: