Linear Algebra

Reference notes: 线性代数笔记

Vectors

Vector Basics

Core Concept: Vectors are the fundamental elements of linear algebra. In linear algebra, a vector is typically visualized as a directed arrow originating from the origin, or represented as an ordered list of numbers.
Basic Operations:
Addition: Corresponding components are added together. Geometrically, this follows the "parallelogram rule" and represents successive displacements. $$ \begin{bmatrix} a \ b \end{bmatrix} + \begin{bmatrix} c \ d \end{bmatrix} = \begin{bmatrix} a+c \ b+d \end{bmatrix} $$
Scalar Multiplication: A scalar is multiplied with each component of the vector. Geometrically, this corresponds to scaling. $$ k \cdot \begin{bmatrix} a \ b \end{bmatrix} = \begin{bmatrix} ka \ kb \end{bmatrix} $$

Linear Combination

Core Concept: The operation of independently scaling a set of vectors and then adding them together.
Formula:

\[ \mathbf{v} = c_1\mathbf{v_1} + c_2\mathbf{v_2} + \dots + c_n\mathbf{v_n} \]

(Note: $c_i$ are scalar coefficients)

Span

Core Concept: The set of all points reachable through every possible linear combination of a given set of vectors (i.e., the space they generate).
Possible Structures:
One-dimensional: Collinear vectors span a line.
Two-dimensional: Non-collinear vectors span a plane.
Three-dimensional: Non-coplanar vectors span the entire 3D space.

Linear Independence

Core Concept: Describes whether there are "redundant" dimensions within a set of vectors.
Linearly Dependent:
At least one vector in the set can be expressed as a linear combination of the others.
Geometric meaning: Some vector lies within the span of the remaining vectors (e.g., three vectors that are coplanar).
Linearly Independent:
No vector can be expressed in terms of the others.
Geometric meaning: Each vector contributes a new dimension to the space.

Basis & Dimension

Basis: A basis of a vector space $V$ must satisfy two conditions simultaneously:

The vectors are linearly independent.
The vectors span the entire space $V$.

Dimension:
The number of vectors in a basis of the space $V$, denoted $\text{dim}(V)$.
Key result: A given space may have infinitely many bases, but every basis contains the same number of vectors (the dimension).

Matrices

Linear Transformations

Core Concept: A linear transformation is a special mapping that keeps the origin fixed and preserves the parallelism and equal spacing of grid lines. Its essence is a description of how space moves.

Conditions for linearity:
1. Straight lines remain straight (no curving).
2. The origin remains fixed (no translation).
Mathematical characterization: A linear transformation is completely determined by the positions of the basis vectors after the transformation.

\[ \text{If } \mathbf{v} = x\mathbf{i} + y\mathbf{j}, \text{ then } T(\mathbf{v}) = xT(\mathbf{i}) + yT(\mathbf{j}) \]

The essence of a matrix: Arranging the coordinates of the transformed basis vectors as columns yields the matrix that describes the transformation. $$ A = \begin{bmatrix} \text{transformed } \mathbf{i} & \text{transformed } \mathbf{j} \end{bmatrix} = \begin{bmatrix} a & c \ b & d \end{bmatrix} $$

Matrices & Basic Operations

Core Concept: A matrix is a rectangular array of numbers. Addition and scalar multiplication correspond to global scaling/shifting of space, while multiplication represents the composition (sequential application) of linear transformations.

Topic	Core Definition	Formula
Matrix Definition	A two-dimensional numerical array of size $m \times n$	$A = (a_{ij})_{m \times n}$
Matrix Addition	Element-wise addition of matrices with the same shape	$C_{ij} = A_{ij} + B_{ij}$
Scalar Multiplication	A scalar acts on every element of the matrix	$B = kA \implies B_{ij} = k \cdot a_{ij}$
Identity Matrix ($I$)	A square matrix with ones on the diagonal, representing the "identity transformation"	$AI = IA = A$

Matrix Multiplication

Core Concept: The product $AB$ means: first apply transformation $B$, then apply transformation $A$.

Computational logic: Each column of the resulting matrix is the linear transformation of the corresponding basis vector of the latter matrix by the former matrix.
Formula: $$ C = AB \implies C_{ij} = \sum_{k=1}^{n} A_{ik}B_{kj} $$
Key Properties:
- Not commutative: $AB \neq BA$ (the order of operations affects the final outcome).
- Associative: $(AB)C = A(BC)$ (applying three transformations in sequence yields the same result regardless of how they are grouped).

Determinant

Core Concept: The determinant measures the scaling factor that a linear transformation applies to areas (or volumes) of regions in space.

Geometric interpretation:
- $\det(A) > 0$: Space is stretched or compressed, but orientation is preserved.
- $\det(A) = 0$: Space is collapsed to a lower dimension (e.g., a plane compressed into a line), meaning the matrix is not invertible.
- $\det(A) < 0$: Space undergoes a "flip" (orientation is reversed).
Formula for 2x2 matrices: $$ \det\left(\begin{bmatrix} a & c \ b & d \end{bmatrix}\right) = ad - bc $$
Multiplicative property: When two transformations are applied in succession, the overall scaling factor equals the product of the individual scaling factors. $$ \det(M_1 M_2) = \det(M_1) \cdot \det(M_2) $$

Inverse Matrix

Core Concept: The inverse matrix represents the reverse motion of a linear transformation; its physical meaning is to restore the transformed space back to its original state.

Geometric criterion:
- If $\det(A) \neq 0$: Space has not been collapsed, so a unique inverse transformation $A^{-1}$ exists.
- If $\det(A) = 0$: Space has been reduced in dimension and information is lost, making restoration impossible -- hence no inverse exists (singular matrix).
Linear systems perspective: Solving $A\mathbf{x} = \mathbf{v}$ is equivalent to finding a vector $\mathbf{x}$ that, after being transformed by $A$, lands on $\mathbf{v}$.
Core formula: $$ A^{-1}A = I \implies \mathbf{x} = A^{-1}\mathbf{v} $$

Rank

Core Concept: The rank represents the actual dimensionality of the space after a linear transformation.

Interpretation:
- Rank 1: All vectors are mapped onto a single line (one-dimensional).
- Rank 2: All vectors are mapped onto a plane (two-dimensional).
- Full Rank: The rank equals the number of columns of the matrix, meaning no dimension is lost in the transformation.
Equivalent definition: The maximum number of linearly independent columns (or rows) of a matrix $A$.

Column Space

Core Concept: The space spanned by the column vectors of a matrix -- that is, the set of all possible outputs (images) of the linear transformation.

Key relationships:
- Rank and space: The rank of a matrix = the dimension of its column space.
- Existence of solutions: The equation $A\mathbf{x} = \mathbf{v}$ has a solution $\iff$ the vector $\mathbf{v}$ lies within the column space of $A$.

Null Space (Kernel)

Core Concept: The set of all input vectors that are mapped to the origin (the zero vector) under a linear transformation.

Geometric meaning: For a transformation that is not full rank, there exists a subspace (a point, a line, or a plane) whose every vector "vanishes" to the origin after the transformation.
Algebraic meaning: The null space is the set of all solutions to the homogeneous equation $A\mathbf{x} = \mathbf{0}$.

Non-Square Matrices: Cross-Dimensional Linear Transformations

Core Concept: Non-square matrices describe mappings between spaces of different dimensions.

Geometric meaning:
- A $3 \times 2$ matrix maps 2D space into 3D space.
- A $2 \times 3$ matrix maps 3D space into 2D space.
Structural understanding:
- Number of columns: The dimension of the input space (the number of basis vectors).
- Number of rows: The dimension of the output space (the number of coordinates needed for each transformed basis vector).

Norms, Dot Products, and Orthogonality

Vector Norms

Core Concept: A norm is a generalized measure of a vector's "length" that satisfies non-negativity, homogeneity, and the triangle inequality.

L1 norm: Manhattan distance -- the sum of the absolute values of all components.

L2 norm: Euclidean distance -- the physical length of the vector.

Dot Product

Core Concept: The dot product captures the projection effect of one vector onto another and reflects the degree of alignment between two vectors.

Geometric interpretation:
- $\mathbf{a} \cdot \mathbf{b} = 0 \implies$ orthogonal (perpendicular).
- $\mathbf{a} \cdot \mathbf{b} > 0 \implies$ roughly the same direction (angle $< 90^\circ$).
- $\mathbf{a} \cdot \mathbf{b} < 0 \implies$ roughly opposite directions (angle $> 90^\circ$).
Mathematical formula: $$ \mathbf{a} \cdot \mathbf{b} = \sum a_i b_i = |\mathbf{a}| |\mathbf{b}| \cos \theta $$
Duality: Any linear transformation from a multi-dimensional space to the one-dimensional number line can always be associated with a unique dual vector, such that the transformation is equivalent to taking the dot product with that vector.

Positive Definite Matrix

Core Concept: A positive definite matrix describes a transformation that is "angle-preserving" in the sense that the angle between the transformed vector and the original vector is always less than $90^\circ$.

Conditions:
1. $A$ is a symmetric square matrix.
2. For every non-zero vector $\mathbf{x}$, the following holds: $$ \mathbf{x}^T A \mathbf{x} > 0 $$

Eigendecomposition and Singular Value Decomposition

Eigenvectors & Eigenvalues

Core Concept: Eigenvectors are special vectors whose direction is unchanged by a linear transformation (they are only stretched or compressed); eigenvalues are the corresponding scaling factors along those directions.

Core formula:

\[ A\mathbf{v} = \lambda \mathbf{v} \]

(where $A$ is the transformation matrix, $\mathbf{v}$ is the eigenvector, and $\lambda$ is the eigenvalue)

Matrix Diagonalization and Eigendecomposition

Core Concept: By changing the basis to consist of eigenvectors, a complex linear transformation is reduced to independent scaling along each axis (a diagonal matrix).

Diagonalization condition: The matrix $M$ must have enough linearly independent eigenvectors to span the space.
Eigendecomposition formula:

\[ M = ADA^{-1} \]

(The columns of $A$ are the eigenvectors; $D$ is a diagonal matrix whose diagonal entries are the corresponding eigenvalues) * Applications:
- Symmetric matrices are always diagonalizable.
- Computational speedup: $M^k = A D^k A^{-1}$, which greatly simplifies the computation of high powers of a matrix.

Singular Value Decomposition (SVD)

Core Concept: SVD is the generalization of eigendecomposition. It decomposes any matrix into three successive operations: "rotate -- scale (and possibly change dimension) -- rotate."

Core formula:

\[ A = U \Sigma V^T \]

Component	Matrix Properties	Geometric / Physical Meaning
$U$	$m \times m$ orthogonal matrix	Basis change (rotation) in the output space
$\Sigma$	$m \times n$ diagonal matrix	Scaling and dimensional transformation (diagonal entries $\sigma$ are the singular values)
$V^T$	$n \times n$ orthogonal matrix	Basis change (rotation) in the input space
* Geometric intuition:
For any matrix transformation, there exists an orthonormal basis such that the images of these basis vectors remain orthogonal.

Low-Rank Approximation

Core Concept: Using SVD, a large matrix can be decomposed into a sum of rank-one matrices. By retaining only the largest singular values, effective data compression is achieved.

Approximation formula: $$ A \approx \sum_{i=1}^{r} \sigma_i \mathbf{u}_i \mathbf{v}_i^T \quad (r < \text{rank}(A)) $$
Rationale: The smaller a singular value $\sigma$ is, the less information the corresponding term carries. Discarding terms with negligible singular values enables denoising or image compression.

QR Decomposition

Any matrix $A \in \mathbb{R}^{m \times n}$ ($m \geq n$) can be decomposed as:

\[A = QR\]

where $Q$ is an orthogonal matrix ($Q^T Q = I$) and $R$ is an upper triangular matrix.

Gram-Schmidt Process: Column-by-column orthogonalization

\[u_k = a_k - \sum_{j=1}^{k-1} \text{proj}_{u_j}(a_k), \quad q_k = \frac{u_k}{\|u_k\|}\]

Applications: - Numerically stable solution of least squares $\|Ax - b\|^2$: $Rx = Q^T b$ - Eigenvalue computation (QR algorithm) - Construction of orthonormal bases

Matrix Calculus

Gradients and the Jacobian

Scalar w.r.t. vector: $\nabla_x f \in \mathbb{R}^n$ (gradient)
Vector w.r.t. vector: $J = \frac{\partial f}{\partial x} \in \mathbb{R}^{m \times n}$ (Jacobian)
Chain rule: $\frac{\partial f}{\partial x} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x}$

Applications of Linear Algebra in ML

Least Squares Regression:

\[\hat{x} = \arg\min \|Ax - b\|^2 = (A^T A)^{-1} A^T b\]

Principal Component Analysis (PCA):

After centering the data matrix $X$, compute the covariance matrix $C = \frac{1}{n}X^TX$; its eigenvectors are the principal component directions. Equivalently, perform SVD on $X$: $X = U\Sigma V^T$, and the principal components are the first $k$ columns of $V$.

Condition Number: $\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}}$; a large condition number indicates numerical instability.

See Probability Theory for the statistical interpretation of covariance matrices.

Topic	Core Definition	Formula
Matrix Definition	A two-dimensional numerical array of size \(m \times n\)	\(A = (a_{ij})_{m \times n}\)
Matrix Addition	Element-wise addition of matrices with the same shape	\(C_{ij} = A_{ij} + B_{ij}\)
Scalar Multiplication	A scalar acts on every element of the matrix	\(B = kA \implies B_{ij} = k \cdot a_{ij}\)
Identity Matrix (\(I\))	A square matrix with ones on the diagonal, representing the "identity transformation"	\(AI = IA = A\)

Component	Matrix Properties	Geometric / Physical Meaning
\(U\)	\(m \times m\) orthogonal matrix	Basis change (rotation) in the output space
\(\Sigma\)	\(m \times n\) diagonal matrix	Scaling and dimensional transformation (diagonal entries \(\sigma\) are the singular values)
\(V^T\)	\(n \times n\) orthogonal matrix	Basis change (rotation) in the input space
* Geometric intuition:
For any matrix transformation, there exists an orthonormal basis such that the images of these basis vectors remain orthogonal.