Advanced Python
Introduction
Python is renowned for its simplicity and elegance, yet its internal mechanisms are far more complex than they appear. This article explores Python's advanced features, memory model, concurrency mechanisms, and performance optimization techniques, helping readers progress from proficiency to mastery.
1. Generators and yield
1.1 Generator Functions
Generators implement lazy evaluation through the yield keyword, producing one value at a time and suspending execution:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Usage
gen = fibonacci()
for _ in range(10):
print(next(gen)) # 0, 1, 1, 2, 3, 5, 8, 13, 21, 34
1.2 Generator Expressions
# List comprehension: generates all elements immediately, consumes memory
squares_list = [x**2 for x in range(10**6)] # ~8MB
# Generator expression: lazy evaluation, minimal memory usage
squares_gen = (x**2 for x in range(10**6)) # ~128B
1.3 yield from
yield from delegates to a sub-generator:
def chain(*iterables):
for it in iterables:
yield from it
list(chain([1,2], [3,4], [5,6])) # [1, 2, 3, 4, 5, 6]
1.4 Coroutine-Style Generators
Generators can also receive values (the precursor to coroutines):
def accumulator():
total = 0
while True:
value = yield total
total += value
acc = accumulator()
next(acc) # prime it, returns 0
acc.send(10) # returns 10
acc.send(20) # returns 30
acc.send(5) # returns 35
2. Decorators
Decorators are higher-order functions that modify the behavior of functions or classes.
2.1 Basic Decorator
import functools
import time
def timer(func):
@functools.wraps(func) # preserve original function metadata
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed:.4f}s")
return result
return wrapper
@timer
def slow_function():
time.sleep(1)
2.2 Decorators with Parameters
def retry(max_attempts=3, delay=1):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_attempts - 1:
raise
time.sleep(delay)
return wrapper
return decorator
@retry(max_attempts=5, delay=0.5)
def unreliable_api_call():
...
2.3 Class Decorators
def singleton(cls):
instances = {}
@functools.wraps(cls)
def get_instance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
@singleton
class Database:
def __init__(self):
print("Connecting...")
3. Context Managers
3.1 Class-Based Implementation
class FileManager:
def __init__(self, filename, mode):
self.filename = filename
self.mode = mode
self.file = None
def __enter__(self):
self.file = open(self.filename, self.mode)
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
self.file.close()
return False # do not suppress exceptions
with FileManager("test.txt", "w") as f:
f.write("Hello")
3.2 contextlib-Based Implementation
from contextlib import contextmanager
@contextmanager
def timer_context(name):
start = time.perf_counter()
try:
yield # code in the with block executes here
finally:
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.4f}s")
with timer_context("Training"):
model.fit(X, y)
4. Metaclasses
A metaclass is a class of a class -- it controls the creation process of classes themselves.
class SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Database(metaclass=SingletonMeta):
def __init__(self):
self.connection = "connected"
# Database() is Database() => True
Practical applications of metaclasses:
- Django ORM: uses metaclasses to automatically map class attributes to database fields
- ABC:
abc.ABCMetaimplements abstract base classes - dataclass: the
@dataclassdecorator internally uses metaclass techniques
5. GIL (Global Interpreter Lock)
5.1 What is the GIL
The GIL (Global Interpreter Lock) is a mutex in CPython that ensures only one thread executes Python bytecode at any given time.
Thread 1: ████----████----████
Thread 2: ----████----████----
↑ only one thread executes at a time
5.2 Impact of the GIL
| Scenario | Impact | Recommendation |
|---|---|---|
| CPU-bound | Multi-threading cannot utilize multiple cores | Use multiprocessing |
| I/O-bound | GIL is released during I/O waits | Multi-threading is effective |
| C extensions | Can manually release the GIL | NumPy, etc. are already optimized |
5.3 Bypassing the GIL
# Option 1: multiprocessing
from multiprocessing import Pool
with Pool(4) as p:
results = p.map(cpu_intensive_func, data)
# Option 2: concurrent.futures
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(func, arg) for arg in args]
Python 3.13+
PEP 703 proposes an optional free-threaded CPython (no-GIL mode), with experimental support starting in Python 3.13.
6. asyncio and Coroutines
6.1 async/await Basics
import asyncio
async def fetch_data(url):
print(f"Fetching {url}...")
await asyncio.sleep(1) # simulate network request
return f"Data from {url}"
async def main():
# Run multiple coroutines concurrently
tasks = [
fetch_data("url1"),
fetch_data("url2"),
fetch_data("url3"),
]
results = await asyncio.gather(*tasks)
print(results)
asyncio.run(main()) # takes about 1 second total (not 3)
6.2 Event Loop
┌─────────────────────────────────┐
│ Event Loop │
│ ┌────┐ ┌────┐ ┌────┐ │
│ │Task1│ │Task2│ │Task3│ │
│ └──┬─┘ └──┬─┘ └──┬─┘ │
│ │ await │ │ │
│ ▼ ▼ ▼ │
│ [Suspended] [Running] [Waiting]│
│ │ │ │ │
│ ▼ ▼ ▼ │
│ [Resumed] [Suspended] [Running]│
└─────────────────────────────────┘
6.3 Async Iterators and Context Managers
async def async_range(n):
for i in range(n):
await asyncio.sleep(0.1)
yield i
async def main():
async for i in async_range(5):
print(i)
7. Memory Model
7.1 Reference Counting + Garbage Collection
import sys
a = [1, 2, 3]
print(sys.getrefcount(a)) # 2 (a + getrefcount parameter)
b = a
print(sys.getrefcount(a)) # 3
del b
print(sys.getrefcount(a)) # 2
Circular reference handling:
import gc
class Node:
def __init__(self):
self.ref = None
a = Node()
b = Node()
a.ref = b
b.ref = a # circular reference
del a, b
# Reference counts will not drop to 0; the gc module's mark-and-sweep is needed
gc.collect() # collects circular references
7.2 Python Object Memory Layout
Every Python object contains at least:
- Reference count (
ob_refcnt) - Type pointer (
ob_type) - Object data
An int object (28 bytes):
┌──────────────┐
│ ob_refcnt: 8B│
│ ob_type: 8B│
│ ob_size: 4B│
│ ob_digit: 4B│ ← actual value
│ padding: 4B│
└──────────────┘
7.3 Small Integer Cache and String Interning
# Small integer cache [-5, 256]
a = 256
b = 256
a is b # True (same object)
a = 257
b = 257
a is b # False (different objects, in interactive mode)
# String interning
a = "hello"
b = "hello"
a is b # True (interning optimization)
8. Type Hints
8.1 Basic Usage
from typing import Optional, Union, List, Dict, Tuple, Callable
def greet(name: str) -> str:
return f"Hello, {name}"
def process(data: List[int],
callback: Callable[[int], bool],
config: Optional[Dict[str, str]] = None) -> Tuple[int, ...]:
...
8.2 Generics and Protocol
from typing import TypeVar, Generic, Protocol
T = TypeVar('T')
class Stack(Generic[T]):
def __init__(self) -> None:
self._items: List[T] = []
def push(self, item: T) -> None:
self._items.append(item)
def pop(self) -> T:
return self._items.pop()
# Structural subtyping (the type-annotated version of Duck Typing)
class Drawable(Protocol):
def draw(self) -> None: ...
8.3 Toolchain
- mypy: static type checker
- pyright: developed by Microsoft, integrated with VS Code
- pydantic: runtime type validation
9. Performance Optimization
9.1 Profiling
# cProfile
import cProfile
cProfile.run('my_function()')
# line_profiler
@profile
def slow_function():
...
# kernprof -l -v script.py
9.2 Acceleration Approaches
| Approach | Mechanism | Speedup | Use Case |
|---|---|---|---|
| NumPy | Vectorized operations in C | 10-100x | Numerical computation |
| Cython | Python → C compilation | 10-100x | Compute-intensive loops |
| Numba | LLVM JIT compilation | 10-200x | Numerical loops |
| PyPy | Whole-program JIT compilation | 2-10x | General Python |
| multiprocessing | Multi-process to bypass GIL | Nx (number of cores) | CPU-bound tasks |
# Numba example
from numba import jit
@jit(nopython=True)
def monte_carlo_pi(n):
count = 0
for i in range(n):
x = random.random()
y = random.random()
if x**2 + y**2 <= 1.0:
count += 1
return 4.0 * count / n
10. Data Model (Dunder Methods)
Python's data model uses special methods (double-underscore methods) to make custom objects behave like built-in types.
10.1 Core Special Methods
class Vector:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"Vector({self.x}, {self.y})"
def __str__(self):
return f"({self.x}, {self.y})"
def __add__(self, other):
return Vector(self.x + other.x, self.y + other.y)
def __abs__(self):
return (self.x**2 + self.y**2)**0.5
def __bool__(self):
return abs(self) != 0
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
def __len__(self):
return 2
def __getitem__(self, index):
if index == 0: return self.x
if index == 1: return self.y
raise IndexError
10.2 Descriptor Protocol
class Validator:
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
return getattr(obj, f'_{self.name}', None)
def __set__(self, obj, value):
if not isinstance(value, (int, float)):
raise TypeError(f"{self.name} must be a number")
if value < 0:
raise ValueError(f"{self.name} must be >= 0")
setattr(obj, f'_{self.name}', value)
class Product:
price = Validator()
quantity = Validator()
References
- "Fluent Python" - Luciano Ramalho
- "CPython Internals" - Anthony Shaw
- Python Official Documentation: Data Model, asyncio