Skip to content

Calculus

Single-Variable Calculus

Limits & Continuity

  • Condition for a Limit to Exist $$ \lim_{x \to a^-} f(x) = \lim_{x \to a^2} f(x) = L $$
  • Continuity Criterion $$ \lim_{x \to x_0} f(x) = f(x_0) $$
  • Intermediate Value Theorem If \(f(x)\) is continuous on \([a, b]\) and \(M\) lies between \(f(a)\) and \(f(b)\), then there exists at least one \(c \in (a, b)\) such that: $$ f(c) = M $$

Definition of Derivative

  • Derivative Definition (Instantaneous Rate of Change) $$ f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$
  • Differential $$ dy = f'(x) dx $$

Common Derivatives

  • Constant: \((C)' = 0\)
  • Power Function: \((x^n)' = nx^{n-1}\)
  • Exponential Function: \((e^x)' = e^x\) ; \((a^x)' = a^x \ln a\)
  • Logarithmic Function: \((\ln x)' = \frac{1}{x}\) ; \((\log_a x)' = \frac{1}{x \ln a}\)
  • Trigonometric Functions:
    • \((\sin x)' = \cos x\)
    • \((\cos x)' = -\sin x\)
    • \((\tan x)' = \sec^2 x\)

Differentiation Rules

  • Sum/Difference Rule: \((u \pm v)' = u' \pm v'\)
  • Product Rule: \((uv)' = u'v + uv'\)
  • Quotient Rule: \((\frac{u}{v})' = \frac{u'v - uv'}{v^2}\)
  • Chain Rule:

    \[ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]

    or \([f(g(x))]' = f'(g(x)) \cdot g'(x)\)

Higher-Order Derivatives

  • Second Derivative: \(y'' = \frac{d}{dx}(\frac{dy}{dx}) = \frac{d^2y}{dx^2}\)
  • \(n\)-th Order Derivative Notation: \(f^{(n)}(x)\)

Geometric Applications

  • Slope of the Tangent Line: \(k = f'(x_0)\)
  • Equation of the Tangent Line: \(y - y_0 = f'(x_0)(x - x_0)\)
  • Equation of the Normal Line: \(y - y_0 = -\frac{1}{f'(x_0)}(x - x_0)\)

Integrals (Supplementary)

  • Fundamental Theorem of Calculus (Newton-Leibniz Formula)

    \[ \int_a^b f(x) dx = F(b) - F(a) \]

    (Note: \(F(x)\) is an antiderivative of \(f(x)\), i.e., \(F'(x) = f(x)\))

Mean Value Theorems

  • Rolle's Theorem If \(f(x)\) is continuous on \([a,b]\), differentiable on \((a,b)\), and \(f(a)=f(b)\), then: $$ \exists \xi \in (a,b), \text{ s.t. } f'(\xi) = 0 $$
  • Lagrange Mean Value Theorem (Lagrange MVT) If \(f(x)\) is continuous on \([a,b]\) and differentiable on \((a,b)\), then: $$ f'(\xi) = \frac{f(b) - f(a)}{b - a} $$
  • Cauchy Mean Value Theorem (Cauchy MVT) If \(f(x), F(x)\) satisfy the above conditions and \(F'(x) \neq 0\), then: $$ \frac{f'(\xi)}{F'(\xi)} = \frac{f(b) - f(a)}{F(b) - F(a)} $$

Differentiation & Integration

  • Differential

    \[ dy = f'(x) dx \]

    (Interpretation: local linearization of a nonlinear function — approximating the curve's increment by the tangent line's increment) * Definite Integral

    $$ \int_a^b f(x) dx = \lim_{n \to \infty} \sum_{i=1}^n f(\xi_i) \Delta x $$ * Indefinite Integral

    $$ \int f(x) dx = F(x) + C $$ * Fundamental Theorem of Calculus (Newton-Leibniz Formula)

    \[ \int_a^b f(x) dx = F(b) - F(a) \]

Applications of Derivatives: Monotonicity and Extrema

  • Stationary Point A point where \(f'(x) = 0\).
  • Monotonicity Test
    • \(f'(x) > 0 \implies\) monotonically increasing
    • \(f'(x) < 0 \implies\) monotonically decreasing
  • Concavity Test
    • \(f''(x) > 0 \implies\) concave up (convex)
    • \(f''(x) < 0 \implies\) concave down
  • Second Derivative Test If \(f'(x_0) = 0\):
    • \(f''(x_0) < 0 \implies\) local maximum
    • \(f''(x_0) > 0 \implies\) local minimum
  • Saddle Point A point where \(f'(x) = 0\) but the function does not change its direction of increase/decrease on either side (increasing on one side, decreasing on the other in a non-extremal way).

Taylor's Formula

  • Core Idea Approximate a function near a given point using a polynomial by matching derivatives of all orders.
  • Taylor Expansion $$ f(x) \approx f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \dots + \frac{f^{(n)}(x_0)}{n!}(x-x_0)^n $$
  • Maclaurin Series The Taylor expansion centered at \(x_0 = 0\): $$ f(x) \approx \sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!} x^n $$

.

Multivariable Calculus

Partial Derivatives

  • Core Concept: Hold all other variables constant and study the rate of change of the function along a single coordinate axis.
  • Definition (partial derivative with respect to \(x\) as an example):
\[ \frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h} \]
  • Higher-Order Mixed Partial Derivatives: If the partial derivatives are continuous, the order of differentiation does not matter:
\[ \frac{\partial^2 z}{\partial x \partial y} = \frac{\partial^2 z}{\partial y \partial x} \]

Directional Derivative

  • Core Concept: The rate of change of a multivariable function along any specified direction \(u\).
  • Formula (simplified using partial derivatives):
\[ D_u f(x, y) = f_x(x, y) \cos \theta + f_y(x, y) \sin \theta \]

(Note: \(\theta\) is the angle between the direction vector and the positive \(x\)-axis)

Gradient \(\nabla f\)

  • Core Concept: A vector composed of all partial derivatives. It points in the direction of the steepest ascent of the function.
  • Definition:
\[ \nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right) \]
  • Properties:
  • Maximum Rate of Change: The directional derivative attains its maximum value in the direction of the gradient, and that maximum equals the magnitude \(\|\nabla f\|\).
  • Gradient Descent: In machine learning, moving in the direction opposite to the gradient, \(-\nabla f\), yields the fastest decrease of the function.

Jacobian Matrix \(J\)

  • Core Concept: The first-order derivative matrix of a vector-valued function with respect to a vector. It describes the local linear transformation in a multidimensional space.
  • Definition: Given \(F: \mathbb{R}^n \to \mathbb{R}^m\):
\[ J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \]

Hessian Matrix \(H\)

  • Core Concept: A symmetric matrix composed of all second-order partial derivatives of a multivariable function. It describes the curvature (concavity/convexity) of the function.
  • Definition:
\[ H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} \end{bmatrix} \]

Optimization of Multivariable Functions

  • Necessary Condition: If a point is an extremum (stationary point), the gradient must be the zero vector: \(\nabla f = \mathbf{0}\).
  • Sufficient Conditions (based on the Hessian matrix \(H\)):
    • \(H\) is positive definite: The stationary point is a local minimum (analogous to \(f''(x)>0\)).
    • \(H\) is negative definite: The stationary point is a local maximum (analogous to \(f''(x)<0\)).
    • \(H\) is indefinite: The stationary point is a saddle point.

Common Differentiation Identities

  • Derivative of a Transpose $$ \frac{\partial}{\partial \mathbf{X}} f(\mathbf{X})^{\top} = \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right)^{\top} $$
  • Derivative of a Trace $$ \frac{\partial}{\partial \mathbf{X}} \text{tr}(f(\mathbf{X})) = \text{tr} \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
  • Derivative of a Determinant $$ \frac{\partial}{\partial \mathbf{X}} \det(f(\mathbf{X})) = \det(f(\mathbf{X})) \text{tr} \left( f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
  • Derivative of an Inverse $$ \frac{\partial}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) = -f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) $$
  • Derivatives of Quadratic Forms and Linear Mappings
    • With respect to matrix \(\mathbf{X}\):

      \[ \frac{\partial \mathbf{a}^{\top} \mathbf{X}^{-1} \mathbf{b}}{\partial \mathbf{X}} = -(\mathbf{X}^{-1})^{\top} \mathbf{a} \mathbf{b}^{\top} (\mathbf{X}^{-1})^{\top} \]

      $$ \frac{\partial \mathbf{a}^{\top} \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{a} \mathbf{b}^{\top} $$ * With respect to vector \(\mathbf{x}\):

      \[ \frac{\partial \mathbf{x}^{\top} \mathbf{a}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]
      \[ \frac{\partial \mathbf{a}^{\top} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]
      \[ \frac{\partial \mathbf{x}^{\top} \mathbf{B} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{x}^{\top} (\mathbf{B} + \mathbf{B}^{\top}) $$ * **Derivative of Least Squares / Weighted Loss Function** If **$\mathbf{W}$** is a symmetric matrix: $$ \frac{\partial}{\partial \mathbf{s}} (\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} (\mathbf{x} - \mathbf{A}\mathbf{s}) = -2(\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} \mathbf{A} \]

Constrained Optimization

Lagrange Multipliers

For equality-constrained optimization problems:

\[\min f(x) \quad \text{s.t.} \quad g_i(x) = 0, \; i = 1, \ldots, m\]

Construct the Lagrangian:

\[\mathcal{L}(x, \lambda) = f(x) + \sum_{i=1}^{m} \lambda_i g_i(x)\]

Necessary conditions (first-order conditions):

\[\nabla_x \mathcal{L} = 0, \quad \nabla_\lambda \mathcal{L} = 0\]

KKT Conditions

For inequality-constrained optimization:

\[\min f(x) \quad \text{s.t.} \quad g_i(x) \leq 0, \; h_j(x) = 0\]

KKT conditions (necessary; also sufficient for convex optimization):

  1. Stationarity: \(\nabla f(x^*) + \sum \mu_i \nabla g_i(x^*) + \sum \lambda_j \nabla h_j(x^*) = 0\)
  2. Primal feasibility: \(g_i(x^*) \leq 0, \; h_j(x^*) = 0\)
  3. Dual feasibility: \(\mu_i \geq 0\)
  4. Complementary slackness: \(\mu_i g_i(x^*) = 0\)

Convexity

Convex Sets and Convex Functions

Convex set: \(\forall x, y \in C, \; \forall \theta \in [0,1]: \; \theta x + (1-\theta)y \in C\)

Convex function: \(f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)\)

Second-order condition: \(f\) is convex \(\Leftrightarrow\) the Hessian matrix is positive semi-definite \(H \succeq 0\)

The importance of convex optimization: local optimum = global optimum.

Matrix Calculus

Scalar-to-Vector Derivatives

\[\frac{\partial}{\partial x}(a^T x) = a, \quad \frac{\partial}{\partial x}(x^T A x) = (A + A^T)x\]

Common Matrix Calculus Identities

Function Derivative
\(a^T x\) \(a\)
\(x^T A x\) \((A + A^T)x\)
\(\|Ax - b\|^2\) \(2A^T(Ax - b)\)
\(\text{tr}(AB)\) \(B^T\) (w.r.t. \(A\))
\(\log\det(X)\) \(X^{-1}\)

These identities are essential when deriving ML algorithms such as least squares, logistic regression, and Gaussian processes.

See Optimization Theory for more comprehensive coverage of convex optimization.


评论 #