Calculus

Single-Variable Calculus

Limits & Continuity

Condition for a Limit to Exist $$ \lim_{x \to a^-} f(x) = \lim_{x \to a^2} f(x) = L $$
Continuity Criterion $$ \lim_{x \to x_0} f(x) = f(x_0) $$
Intermediate Value Theorem If $f(x)$ is continuous on $[a, b]$ and $M$ lies between $f(a)$ and $f(b)$, then there exists at least one $c \in (a, b)$ such that: $$ f(c) = M $$

Definition of Derivative

Derivative Definition (Instantaneous Rate of Change) $$ f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$
Differential $$ dy = f'(x) dx $$

Common Derivatives

Constant: $(C)' = 0$
Power Function: $(x^n)' = nx^{n-1}$
Exponential Function: $(e^x)' = e^x$ ; $(a^x)' = a^x \ln a$
Logarithmic Function: $(\ln x)' = \frac{1}{x}$ ; $(\log_a x)' = \frac{1}{x \ln a}$
Trigonometric Functions:
- $(\sin x)' = \cos x$
- $(\cos x)' = -\sin x$
- $(\tan x)' = \sec^2 x$

Differentiation Rules

Sum/Difference Rule: $(u \pm v)' = u' \pm v'$
Product Rule: $(uv)' = u'v + uv'$
Quotient Rule: $(\frac{u}{v})' = \frac{u'v - uv'}{v^2}$
Chain Rule:

\[ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]

or $[f(g(x))]' = f'(g(x)) \cdot g'(x)$

Higher-Order Derivatives

Second Derivative: $y'' = \frac{d}{dx}(\frac{dy}{dx}) = \frac{d^2y}{dx^2}$
$n$-th Order Derivative Notation: $f^{(n)}(x)$

Geometric Applications

Slope of the Tangent Line: $k = f'(x_0)$
Equation of the Tangent Line: $y - y_0 = f'(x_0)(x - x_0)$
Equation of the Normal Line: $y - y_0 = -\frac{1}{f'(x_0)}(x - x_0)$

Integrals (Supplementary)

Fundamental Theorem of Calculus (Newton-Leibniz Formula)

\[ \int_a^b f(x) dx = F(b) - F(a) \]

(Note: $F(x)$ is an antiderivative of $f(x)$, i.e., $F'(x) = f(x)$)

Mean Value Theorems

Rolle's Theorem If $f(x)$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $f(a)=f(b)$, then: $$ \exists \xi \in (a,b), \text{ s.t. } f'(\xi) = 0 $$
Lagrange Mean Value Theorem (Lagrange MVT) If $f(x)$ is continuous on $[a,b]$ and differentiable on $(a,b)$, then: $$ f'(\xi) = \frac{f(b) - f(a)}{b - a} $$
Cauchy Mean Value Theorem (Cauchy MVT) If $f(x), F(x)$ satisfy the above conditions and $F'(x) \neq 0$, then: $$ \frac{f'(\xi)}{F'(\xi)} = \frac{f(b) - f(a)}{F(b) - F(a)} $$

Differentiation & Integration

Differential

\[ dy = f'(x) dx \]

(Interpretation: local linearization of a nonlinear function — approximating the curve's increment by the tangent line's increment) * Definite Integral

$$ \int_a^b f(x) dx = \lim_{n \to \infty} \sum_{i=1}^n f(\xi_i) \Delta x $$ * Indefinite Integral

$$ \int f(x) dx = F(x) + C $$ * Fundamental Theorem of Calculus (Newton-Leibniz Formula)

\[ \int_a^b f(x) dx = F(b) - F(a) \]

Applications of Derivatives: Monotonicity and Extrema

Stationary Point A point where $f'(x) = 0$.
Monotonicity Test
- $f'(x) > 0 \implies$ monotonically increasing
- $f'(x) < 0 \implies$ monotonically decreasing
Concavity Test
- $f''(x) > 0 \implies$ concave up (convex)
- $f''(x) < 0 \implies$ concave down
Second Derivative Test If $f'(x_0) = 0$:
- $f''(x_0) < 0 \implies$ local maximum
- $f''(x_0) > 0 \implies$ local minimum
Saddle Point A point where $f'(x) = 0$ but the function does not change its direction of increase/decrease on either side (increasing on one side, decreasing on the other in a non-extremal way).

Taylor's Formula

Core Idea Approximate a function near a given point using a polynomial by matching derivatives of all orders.
Taylor Expansion $$ f(x) \approx f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \dots + \frac{f^{(n)}(x_0)}{n!}(x-x_0)^n $$
Maclaurin Series The Taylor expansion centered at $x_0 = 0$: $$ f(x) \approx \sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!} x^n $$

.

Multivariable Calculus

Partial Derivatives

Core Concept: Hold all other variables constant and study the rate of change of the function along a single coordinate axis.
Definition (partial derivative with respect to $x$ as an example):

\[ \frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h} \]

Higher-Order Mixed Partial Derivatives: If the partial derivatives are continuous, the order of differentiation does not matter:

\[ \frac{\partial^2 z}{\partial x \partial y} = \frac{\partial^2 z}{\partial y \partial x} \]

Directional Derivative

Core Concept: The rate of change of a multivariable function along any specified direction $u$.
Formula (simplified using partial derivatives):

\[ D_u f(x, y) = f_x(x, y) \cos \theta + f_y(x, y) \sin \theta \]

(Note: $\theta$ is the angle between the direction vector and the positive $x$-axis)

Gradient $\nabla f$

Core Concept: A vector composed of all partial derivatives. It points in the direction of the steepest ascent of the function.
Definition:

\[ \nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right) \]

Properties:
Maximum Rate of Change: The directional derivative attains its maximum value in the direction of the gradient, and that maximum equals the magnitude $\|\nabla f\|$.
Gradient Descent: In machine learning, moving in the direction opposite to the gradient, $-\nabla f$, yields the fastest decrease of the function.

Jacobian Matrix $J$

Core Concept: The first-order derivative matrix of a vector-valued function with respect to a vector. It describes the local linear transformation in a multidimensional space.
Definition: Given $F: \mathbb{R}^n \to \mathbb{R}^m$:

\[ J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \]

Hessian Matrix $H$

Core Concept: A symmetric matrix composed of all second-order partial derivatives of a multivariable function. It describes the curvature (concavity/convexity) of the function.
Definition:

\[ H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} \end{bmatrix} \]

Optimization of Multivariable Functions

Necessary Condition: If a point is an extremum (stationary point), the gradient must be the zero vector: $\nabla f = \mathbf{0}$.
Sufficient Conditions (based on the Hessian matrix $H$):
- $H$ is positive definite: The stationary point is a local minimum (analogous to $f''(x)>0$).
- $H$ is negative definite: The stationary point is a local maximum (analogous to $f''(x)<0$).
- $H$ is indefinite: The stationary point is a saddle point.

Common Differentiation Identities

Derivative of a Transpose $$ \frac{\partial}{\partial \mathbf{X}} f(\mathbf{X})^{\top} = \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right)^{\top} $$
Derivative of a Trace $$ \frac{\partial}{\partial \mathbf{X}} \text{tr}(f(\mathbf{X})) = \text{tr} \left( \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
Derivative of a Determinant $$ \frac{\partial}{\partial \mathbf{X}} \det(f(\mathbf{X})) = \det(f(\mathbf{X})) \text{tr} \left( f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} \right) $$
Derivative of an Inverse $$ \frac{\partial}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) = -f^{-1}(\mathbf{X}) \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} f^{-1}(\mathbf{X}) $$
Derivatives of Quadratic Forms and Linear Mappings
- With respect to matrix $\mathbf{X}$:
  
  \[ \frac{\partial \mathbf{a}^{\top} \mathbf{X}^{-1} \mathbf{b}}{\partial \mathbf{X}} = -(\mathbf{X}^{-1})^{\top} \mathbf{a} \mathbf{b}^{\top} (\mathbf{X}^{-1})^{\top} \]
  
  $$ \frac{\partial \mathbf{a}^{\top} \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{a} \mathbf{b}^{\top} $$ * With respect to vector $\mathbf{x}$:
  
  \[ \frac{\partial \mathbf{x}^{\top} \mathbf{a}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]
  
  \[ \frac{\partial \mathbf{a}^{\top} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{a}^{\top} \]
  
  \[ \frac{\partial \mathbf{x}^{\top} \mathbf{B} \mathbf{x}}{\partial \mathbf{x}} = \mathbf{x}^{\top} (\mathbf{B} + \mathbf{B}^{\top}) $$ * **Derivative of Least Squares / Weighted Loss Function** If **$\mathbf{W}$** is a symmetric matrix: $$ \frac{\partial}{\partial \mathbf{s}} (\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} (\mathbf{x} - \mathbf{A}\mathbf{s}) = -2(\mathbf{x} - \mathbf{A}\mathbf{s})^{\top} \mathbf{W} \mathbf{A} \]

Constrained Optimization

Lagrange Multipliers

For equality-constrained optimization problems:

\[\min f(x) \quad \text{s.t.} \quad g_i(x) = 0, \; i = 1, \ldots, m\]

Construct the Lagrangian:

\[\mathcal{L}(x, \lambda) = f(x) + \sum_{i=1}^{m} \lambda_i g_i(x)\]

Necessary conditions (first-order conditions):

\[\nabla_x \mathcal{L} = 0, \quad \nabla_\lambda \mathcal{L} = 0\]

KKT Conditions

For inequality-constrained optimization:

\[\min f(x) \quad \text{s.t.} \quad g_i(x) \leq 0, \; h_j(x) = 0\]

KKT conditions (necessary; also sufficient for convex optimization):

Stationarity: $\nabla f(x^*) + \sum \mu_i \nabla g_i(x^*) + \sum \lambda_j \nabla h_j(x^*) = 0$
Primal feasibility: $g_i(x^*) \leq 0, \; h_j(x^*) = 0$
Dual feasibility: $\mu_i \geq 0$
Complementary slackness: $\mu_i g_i(x^*) = 0$

Convexity

Convex Sets and Convex Functions

Convex set: $\forall x, y \in C, \; \forall \theta \in [0,1]: \; \theta x + (1-\theta)y \in C$

Convex function: $f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$

Second-order condition: $f$ is convex $\Leftrightarrow$ the Hessian matrix is positive semi-definite $H \succeq 0$

The importance of convex optimization: local optimum = global optimum.

Matrix Calculus

Scalar-to-Vector Derivatives

\[\frac{\partial}{\partial x}(a^T x) = a, \quad \frac{\partial}{\partial x}(x^T A x) = (A + A^T)x\]

Common Matrix Calculus Identities

Function	Derivative
$a^T x$	$a$
$x^T A x$	$(A + A^T)x$
$\\|Ax - b\\|^2$	$2A^T(Ax - b)$
$\text{tr}(AB)$	$B^T$ (w.r.t. $A$)
$\log\det(X)$	$X^{-1}$

These identities are essential when deriving ML algorithms such as least squares, logistic regression, and Gaussian processes.

See Optimization Theory for more comprehensive coverage of convex optimization.

Function	Derivative
\(a^T x\)	\(a\)
\(x^T A x\)	\((A + A^T)x\)
\(\\|Ax - b\\|^2\)	\(2A^T(Ax - b)\)
\(\text{tr}(AB)\)	\(B^T\) (w.r.t. \(A\))
\(\log\det(X)\)	\(X^{-1}\)