Calculus
Single-Variable Calculus
Limits & Continuity
- Condition for a Limit to Exist $$ \lim_{x \to a^-} f(x) = \lim_{x \to a^2} f(x) = L $$
- Continuity Criterion $$ \lim_{x \to x_0} f(x) = f(x_0) $$
- Intermediate Value Theorem If \(f(x)\) is continuous on \([a, b]\) and \(M\) lies between \(f(a)\) and \(f(b)\), then there exists at least one \(c \in (a, b)\) such that: $$ f(c) = M $$
Definition of Derivative
- Derivative Definition (Instantaneous Rate of Change) $$ f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$
- Differential $$ dy = f'(x) dx $$
Common Derivatives
- Constant: \((C)' = 0\)
- Power Function: \((x^n)' = nx^{n-1}\)
- Exponential Function: \((e^x)' = e^x\) ; \((a^x)' = a^x \ln a\)
- Logarithmic Function: \((\ln x)' = \frac{1}{x}\) ; \((\log_a x)' = \frac{1}{x \ln a}\)
- Trigonometric Functions:
- \((\sin x)' = \cos x\)
- \((\cos x)' = -\sin x\)
- \((\tan x)' = \sec^2 x\)
Differentiation Rules
- Sum/Difference Rule: \((u \pm v)' = u' \pm v'\)
- Product Rule: \((uv)' = u'v + uv'\)
- Quotient Rule: \((\frac{u}{v})' = \frac{u'v - uv'}{v^2}\)
-
Chain Rule:
\[ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]or \([f(g(x))]' = f'(g(x)) \cdot g'(x)\)
Higher-Order Derivatives
- Second Derivative: \(y'' = \frac{d}{dx}(\frac{dy}{dx}) = \frac{d^2y}{dx^2}\)
- \(n\)-th Order Derivative Notation: \(f^{(n)}(x)\)
Geometric Applications
- Slope of the Tangent Line: \(k = f'(x_0)\)
- Equation of the Tangent Line: \(y - y_0 = f'(x_0)(x - x_0)\)
- Equation of the Normal Line: \(y - y_0 = -\frac{1}{f'(x_0)}(x - x_0)\)
Integrals (Supplementary)
-
Fundamental Theorem of Calculus (Newton-Leibniz Formula)
\[ \int_a^b f(x) dx = F(b) - F(a) \](Note: \(F(x)\) is an antiderivative of \(f(x)\), i.e., \(F'(x) = f(x)\))
Mean Value Theorems
- Rolle's Theorem If \(f(x)\) is continuous on \([a,b]\), differentiable on \((a,b)\), and \(f(a)=f(b)\), then: $$ \exists \xi \in (a,b), \text{ s.t. } f'(\xi) = 0 $$
- Lagrange Mean Value Theorem (Lagrange MVT) If \(f(x)\) is continuous on \([a,b]\) and differentiable on \((a,b)\), then: $$ f'(\xi) = \frac{f(b) - f(a)}{b - a} $$
- Cauchy Mean Value Theorem (Cauchy MVT) If \(f(x), F(x)\) satisfy the above conditions and \(F'(x) \neq 0\), then: $$ \frac{f'(\xi)}{F'(\xi)} = \frac{f(b) - f(a)}{F(b) - F(a)} $$
Differentiation & Integration
-
Differential
\[ dy = f'(x) dx \](Interpretation: local linearization of a nonlinear function — approximating the curve's increment by the tangent line's increment) * Definite Integral
$$ \int_a^b f(x) dx = \lim_{n \to \infty} \sum_{i=1}^n f(\xi_i) \Delta x $$ * Indefinite Integral
$$ \int f(x) dx = F(x) + C $$ * Fundamental Theorem of Calculus (Newton-Leibniz Formula)
\[ \int_a^b f(x) dx = F(b) - F(a) \]
Applications of Derivatives: Monotonicity and Extrema
- Stationary Point A point where \(f'(x) = 0\).
- Monotonicity Test
- \(f'(x) > 0 \implies\) monotonically increasing
- \(f'(x) < 0 \implies\) monotonically decreasing
- Concavity Test
- \(f''(x) > 0 \implies\) concave up (convex)
- \(f''(x) < 0 \implies\) concave down
- Second Derivative Test
If \(f'(x_0) = 0\):
- \(f''(x_0) < 0 \implies\) local maximum
- \(f''(x_0) > 0 \implies\) local minimum
- Saddle Point A point where \(f'(x) = 0\) but the function does not change its direction of increase/decrease on either side (increasing on one side, decreasing on the other in a non-extremal way).
Taylor's Formula
- Core Idea Approximate a function near a given point using a polynomial by matching derivatives of all orders.
- Taylor Expansion $$ f(x) \approx f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \dots + \frac{f^{(n)}(x_0)}{n!}(x-x_0)^n $$
- Maclaurin Series The Taylor expansion centered at \(x_0 = 0\): $$ f(x) \approx \sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!} x^n $$
.
Multivariable Calculus
Partial Derivatives
- Core Concept: Hold all other variables constant and study the rate of change of the function along a single coordinate axis.
- Definition (partial derivative with respect to \(x\) as an example):
- Higher-Order Mixed Partial Derivatives: If the partial derivatives are continuous, the order of differentiation does not matter:
Directional Derivative
- Core Concept: The rate of change of a multivariable function along any specified direction \(u\).
- Formula (simplified using partial derivatives):
(Note: \(\theta\) is the angle between the direction vector and the positive \(x\)-axis)
Gradient \(\nabla f\)
- Core Concept: A vector composed of all partial derivatives. It points in the direction of the steepest ascent of the function.
- Definition:
- Properties:
- Maximum Rate of Change: The directional derivative attains its maximum value in the direction of the gradient, and that maximum equals the magnitude \(\|\nabla f\|\).
- Gradient Descent: In machine learning, moving in the direction opposite to the gradient, \(-\nabla f\), yields the fastest decrease of the function.
Jacobian Matrix \(J\)
- Core Concept: The first-order derivative matrix of a vector-valued function with respect to a vector. It describes the local linear transformation in a multidimensional space.
- Definition: Given \(F: \mathbb{R}^n \to \mathbb{R}^m\):
Hessian Matrix \(H\)
- Core Concept: A symmetric matrix composed of all second-order partial derivatives of a multivariable function. It describes the curvature (concavity/convexity) of the function.
- Definition:
Optimization of Multivariable Functions
- Necessary Condition: If a point is an extremum (stationary point), the gradient must be the zero vector: \(\nabla f = \mathbf{0}\).
- Sufficient Conditions (based on the Hessian matrix \(H\)):
- \(H\) is positive definite: The stationary point is a local minimum (analogous to \(f''(x)>0\)).
- \(H\) is negative definite: The stationary point is a local maximum (analogous to \(f''(x)<0\)).
- \(H\) is indefinite: The stationary point is a saddle point.
Common Differentiation Identities
- Derivative of a Transpose $$ \frac{\partial}{\partial \mathbf{X}} f(\mathbf{X})^{\top} = \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right)^{\top} $$
- Derivative of a Trace $$ \frac{\partial}{\partial \mathbf{X}} \text{tr}(f(\mathbf{X})) = \text{tr} \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
- Derivative of a Determinant $$ \frac{\partial}{\partial \mathbf{X}} \det(f(\mathbf{X})) = \det(f(\mathbf{X})) \text{tr} \left( f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
- Derivative of an Inverse $$ \frac{\partial}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) = -f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) $$
- Derivatives of Quadratic Forms and Linear Mappings
-
With respect to matrix \(\mathbf{X}\):
\[ \frac{\partial \mathbf{a}^{\top} \mathbf{X}^{-1} \mathbf{b}}{\partial \mathbf{X}} = -(\mathbf{X}^{-1})^{\top} \mathbf{a} \mathbf{b}^{\top} (\mathbf{X}^{-1})^{\top} \]$$ \frac{\partial \mathbf{a}^{\top} \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{a} \mathbf{b}^{\top} $$ * With respect to vector \(\mathbf{x}\):
\[ \frac{\partial \mathbf{x}^{\top} \mathbf{a}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]\[ \frac{\partial \mathbf{a}^{\top} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]\[ \frac{\partial \mathbf{x}^{\top} \mathbf{B} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{x}^{\top} (\mathbf{B} + \mathbf{B}^{\top}) $$ * **Derivative of Least Squares / Weighted Loss Function** If **$\mathbf{W}$** is a symmetric matrix: $$ \frac{\partial}{\partial \mathbf{s}} (\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} (\mathbf{x} - \mathbf{A}\mathbf{s}) = -2(\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} \mathbf{A} \]
-
Constrained Optimization
Lagrange Multipliers
For equality-constrained optimization problems:
Construct the Lagrangian:
Necessary conditions (first-order conditions):
KKT Conditions
For inequality-constrained optimization:
KKT conditions (necessary; also sufficient for convex optimization):
- Stationarity: \(\nabla f(x^*) + \sum \mu_i \nabla g_i(x^*) + \sum \lambda_j \nabla h_j(x^*) = 0\)
- Primal feasibility: \(g_i(x^*) \leq 0, \; h_j(x^*) = 0\)
- Dual feasibility: \(\mu_i \geq 0\)
- Complementary slackness: \(\mu_i g_i(x^*) = 0\)
Convexity
Convex Sets and Convex Functions
Convex set: \(\forall x, y \in C, \; \forall \theta \in [0,1]: \; \theta x + (1-\theta)y \in C\)
Convex function: \(f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)\)
Second-order condition: \(f\) is convex \(\Leftrightarrow\) the Hessian matrix is positive semi-definite \(H \succeq 0\)
The importance of convex optimization: local optimum = global optimum.
Matrix Calculus
Scalar-to-Vector Derivatives
Common Matrix Calculus Identities
| Function | Derivative |
|---|---|
| \(a^T x\) | \(a\) |
| \(x^T A x\) | \((A + A^T)x\) |
| \(\|Ax - b\|^2\) | \(2A^T(Ax - b)\) |
| \(\text{tr}(AB)\) | \(B^T\) (w.r.t. \(A\)) |
| \(\log\det(X)\) | \(X^{-1}\) |
These identities are essential when deriving ML algorithms such as least squares, logistic regression, and Gaussian processes.
See Optimization Theory for more comprehensive coverage of convex optimization.