Skip to content

Advanced Python

Introduction

Python is renowned for its simplicity and elegance, yet its internal mechanisms are far more complex than they appear. This article explores Python's advanced features, memory model, concurrency mechanisms, and performance optimization techniques, helping readers progress from proficiency to mastery.


1. Generators and yield

1.1 Generator Functions

Generators implement lazy evaluation through the yield keyword, producing one value at a time and suspending execution:

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Usage
gen = fibonacci()
for _ in range(10):
    print(next(gen))  # 0, 1, 1, 2, 3, 5, 8, 13, 21, 34

1.2 Generator Expressions

# List comprehension: generates all elements immediately, consumes memory
squares_list = [x**2 for x in range(10**6)]  # ~8MB

# Generator expression: lazy evaluation, minimal memory usage
squares_gen = (x**2 for x in range(10**6))   # ~128B

1.3 yield from

yield from delegates to a sub-generator:

def chain(*iterables):
    for it in iterables:
        yield from it

list(chain([1,2], [3,4], [5,6]))  # [1, 2, 3, 4, 5, 6]

1.4 Coroutine-Style Generators

Generators can also receive values (the precursor to coroutines):

def accumulator():
    total = 0
    while True:
        value = yield total
        total += value

acc = accumulator()
next(acc)          # prime it, returns 0
acc.send(10)       # returns 10
acc.send(20)       # returns 30
acc.send(5)        # returns 35

2. Decorators

Decorators are higher-order functions that modify the behavior of functions or classes.

2.1 Basic Decorator

import functools
import time

def timer(func):
    @functools.wraps(func)  # preserve original function metadata
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.4f}s")
        return result
    return wrapper

@timer
def slow_function():
    time.sleep(1)

2.2 Decorators with Parameters

def retry(max_attempts=3, delay=1):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts - 1:
                        raise
                    time.sleep(delay)
        return wrapper
    return decorator

@retry(max_attempts=5, delay=0.5)
def unreliable_api_call():
    ...

2.3 Class Decorators

def singleton(cls):
    instances = {}
    @functools.wraps(cls)
    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance

@singleton
class Database:
    def __init__(self):
        print("Connecting...")

3. Context Managers

3.1 Class-Based Implementation

class FileManager:
    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode
        self.file = None

    def __enter__(self):
        self.file = open(self.filename, self.mode)
        return self.file

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.file.close()
        return False  # do not suppress exceptions

with FileManager("test.txt", "w") as f:
    f.write("Hello")

3.2 contextlib-Based Implementation

from contextlib import contextmanager

@contextmanager
def timer_context(name):
    start = time.perf_counter()
    try:
        yield  # code in the with block executes here
    finally:
        elapsed = time.perf_counter() - start
        print(f"{name}: {elapsed:.4f}s")

with timer_context("Training"):
    model.fit(X, y)

4. Metaclasses

A metaclass is a class of a class -- it controls the creation process of classes themselves.

class SingletonMeta(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

class Database(metaclass=SingletonMeta):
    def __init__(self):
        self.connection = "connected"

# Database() is Database()  => True

Practical applications of metaclasses:

  • Django ORM: uses metaclasses to automatically map class attributes to database fields
  • ABC: abc.ABCMeta implements abstract base classes
  • dataclass: the @dataclass decorator internally uses metaclass techniques

5. GIL (Global Interpreter Lock)

5.1 What is the GIL

The GIL (Global Interpreter Lock) is a mutex in CPython that ensures only one thread executes Python bytecode at any given time.

Thread 1: ████----████----████
Thread 2: ----████----████----
           ↑ only one thread executes at a time

5.2 Impact of the GIL

Scenario Impact Recommendation
CPU-bound Multi-threading cannot utilize multiple cores Use multiprocessing
I/O-bound GIL is released during I/O waits Multi-threading is effective
C extensions Can manually release the GIL NumPy, etc. are already optimized

5.3 Bypassing the GIL

# Option 1: multiprocessing
from multiprocessing import Pool
with Pool(4) as p:
    results = p.map(cpu_intensive_func, data)

# Option 2: concurrent.futures
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(func, arg) for arg in args]

Python 3.13+

PEP 703 proposes an optional free-threaded CPython (no-GIL mode), with experimental support starting in Python 3.13.


6. asyncio and Coroutines

6.1 async/await Basics

import asyncio

async def fetch_data(url):
    print(f"Fetching {url}...")
    await asyncio.sleep(1)  # simulate network request
    return f"Data from {url}"

async def main():
    # Run multiple coroutines concurrently
    tasks = [
        fetch_data("url1"),
        fetch_data("url2"),
        fetch_data("url3"),
    ]
    results = await asyncio.gather(*tasks)
    print(results)

asyncio.run(main())  # takes about 1 second total (not 3)

6.2 Event Loop

┌─────────────────────────────────┐
│         Event Loop              │
│  ┌────┐  ┌────┐  ┌────┐        │
│  │Task1│  │Task2│  │Task3│      │
│  └──┬─┘  └──┬─┘  └──┬─┘        │
│     │ await  │       │          │
│     ▼        ▼       ▼          │
│  [Suspended] [Running] [Waiting]│
│     │        │       │          │
│     ▼        ▼       ▼          │
│  [Resumed] [Suspended] [Running]│
└─────────────────────────────────┘

6.3 Async Iterators and Context Managers

async def async_range(n):
    for i in range(n):
        await asyncio.sleep(0.1)
        yield i

async def main():
    async for i in async_range(5):
        print(i)

7. Memory Model

7.1 Reference Counting + Garbage Collection

import sys

a = [1, 2, 3]
print(sys.getrefcount(a))  # 2 (a + getrefcount parameter)

b = a
print(sys.getrefcount(a))  # 3

del b
print(sys.getrefcount(a))  # 2

Circular reference handling:

import gc

class Node:
    def __init__(self):
        self.ref = None

a = Node()
b = Node()
a.ref = b
b.ref = a  # circular reference

del a, b
# Reference counts will not drop to 0; the gc module's mark-and-sweep is needed
gc.collect()  # collects circular references

7.2 Python Object Memory Layout

Every Python object contains at least:

  • Reference count (ob_refcnt)
  • Type pointer (ob_type)
  • Object data
An int object (28 bytes):
┌──────────────┐
│ ob_refcnt: 8B│
│ ob_type:   8B│
│ ob_size:   4B│
│ ob_digit:  4B│  ← actual value
│ padding:   4B│
└──────────────┘

7.3 Small Integer Cache and String Interning

# Small integer cache [-5, 256]
a = 256
b = 256
a is b  # True (same object)

a = 257
b = 257
a is b  # False (different objects, in interactive mode)

# String interning
a = "hello"
b = "hello"
a is b  # True (interning optimization)

8. Type Hints

8.1 Basic Usage

from typing import Optional, Union, List, Dict, Tuple, Callable

def greet(name: str) -> str:
    return f"Hello, {name}"

def process(data: List[int], 
            callback: Callable[[int], bool],
            config: Optional[Dict[str, str]] = None) -> Tuple[int, ...]:
    ...

8.2 Generics and Protocol

from typing import TypeVar, Generic, Protocol

T = TypeVar('T')

class Stack(Generic[T]):
    def __init__(self) -> None:
        self._items: List[T] = []

    def push(self, item: T) -> None:
        self._items.append(item)

    def pop(self) -> T:
        return self._items.pop()

# Structural subtyping (the type-annotated version of Duck Typing)
class Drawable(Protocol):
    def draw(self) -> None: ...

8.3 Toolchain

  • mypy: static type checker
  • pyright: developed by Microsoft, integrated with VS Code
  • pydantic: runtime type validation

9. Performance Optimization

9.1 Profiling

# cProfile
import cProfile
cProfile.run('my_function()')

# line_profiler
@profile
def slow_function():
    ...
# kernprof -l -v script.py

9.2 Acceleration Approaches

Approach Mechanism Speedup Use Case
NumPy Vectorized operations in C 10-100x Numerical computation
Cython Python → C compilation 10-100x Compute-intensive loops
Numba LLVM JIT compilation 10-200x Numerical loops
PyPy Whole-program JIT compilation 2-10x General Python
multiprocessing Multi-process to bypass GIL Nx (number of cores) CPU-bound tasks
# Numba example
from numba import jit

@jit(nopython=True)
def monte_carlo_pi(n):
    count = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if x**2 + y**2 <= 1.0:
            count += 1
    return 4.0 * count / n

10. Data Model (Dunder Methods)

Python's data model uses special methods (double-underscore methods) to make custom objects behave like built-in types.

10.1 Core Special Methods

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Vector({self.x}, {self.y})"

    def __str__(self):
        return f"({self.x}, {self.y})"

    def __add__(self, other):
        return Vector(self.x + other.x, self.y + other.y)

    def __abs__(self):
        return (self.x**2 + self.y**2)**0.5

    def __bool__(self):
        return abs(self) != 0

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

    def __hash__(self):
        return hash((self.x, self.y))

    def __len__(self):
        return 2

    def __getitem__(self, index):
        if index == 0: return self.x
        if index == 1: return self.y
        raise IndexError

10.2 Descriptor Protocol

class Validator:
    def __set_name__(self, owner, name):
        self.name = name

    def __get__(self, obj, objtype=None):
        return getattr(obj, f'_{self.name}', None)

    def __set__(self, obj, value):
        if not isinstance(value, (int, float)):
            raise TypeError(f"{self.name} must be a number")
        if value < 0:
            raise ValueError(f"{self.name} must be >= 0")
        setattr(obj, f'_{self.name}', value)

class Product:
    price = Validator()
    quantity = Validator()

References

  • "Fluent Python" - Luciano Ramalho
  • "CPython Internals" - Anthony Shaw
  • Python Official Documentation: Data Model, asyncio

评论 #