Coordinate Frames

Body Frame

The body frame is a coordinate system centered on the robot itself, moving and rotating together with it.

Imagine you are standing in a room:

World frame: The cardinal directions of the room — no matter how you turn your head, "north" is always north.
Body frame: "Forward" is always the direction you are facing, and "left" is always your left-hand side.

World Frame (fixed)                Body Frame (rotates with the robot)

    Y(北)                        Y(左)
    ^                            ^
    |                            |
    |                       机器人头 --> X(前)
    +-----> X(东)

Suppose the exit is located 2 meters to the robot's "front-left":

Representation	World Frame	Body Frame
Robot facing north	exit_dir = (−1, 1)	exit_dir = (1, 1) (front-left)
Robot facing south (180° turn)	exit_dir = (−1, 1) (unchanged!)	exit_dir = (−1, −1) (rear-right)
What the policy must learn	Same exit_dir, but must turn left when facing north and right when facing south → must combine yaw to decide	Directly tells the policy "exit is front-left" → just turn left

In the world frame, the policy network must learn on its own how to "read the yaw angle + read the exit's world-frame direction -> figure out which way to turn" -- an implicit trigonometric relationship. This is extremely difficult for a small [128, 64] network.

Homogeneous Transformation Matrix

A homogeneous transformation matrix is a \(4 \times 4\) matrix that simultaneously represents rotation and translation:

\[ T = \begin{bmatrix} R & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} = \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

Where:

\(R \in SO(3)\): A \(3 \times 3\) rotation matrix
\(\mathbf{t} \in \mathbb{R}^3\): Translation vector

Key advantage: Unifies rotation and translation into matrix multiplication; multiple transformations are simply chained:

\[ T_{0n} = T_{01} \cdot T_{12} \cdot T_{23} \cdots T_{(n-1)n} \]

Inverse transformation:

\[ T^{-1} = \begin{bmatrix} R^T & -R^T\mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \]

DH Parameters (Denavit-Hartenberg Convention)

DH parameters provide the standard method for establishing link coordinate frames in robot forward kinematics.

The Four Parameters

Parameter	Symbol	Definition
Joint angle	\(\theta_i\)	Rotation about \(z_{i-1}\), from \(x_{i-1}\) to \(x_i\)
Link offset	\(d_i\)	Translation along \(z_{i-1}\), from \(x_{i-1}\) to \(x_i\)
Link length	\(a_i\)	Translation along \(x_i\), from \(z_{i-1}\) to \(z_i\)
Link twist	\(\alpha_i\)	Rotation about \(x_i\), from \(z_{i-1}\) to \(z_i\)

For revolute joints, \(\theta_i\) is the variable; for prismatic joints, \(d_i\) is the variable.

Setup Procedure

Number the links: Starting from the base (link 0) to the end-effector
Define \(z\) axes: \(z_i\) points along the axis of joint \(i+1\)
Define origins: \(O_i\) is at the intersection of the common normal between \(z_{i-1}\) and \(z_i\) with \(z_i\)
Define \(x\) axes: \(x_i\) points along the common normal from \(z_{i-1}\) to \(z_i\)
Define \(y\) axes: Determined by the right-hand rule
Read parameters: Determine \(\theta_i, d_i, a_i, \alpha_i\) from the geometric relationships between frames

Single-Joint Transformation Matrix

\[ A_i = Rot(z, \theta_i) \cdot Trans(z, d_i) \cdot Trans(x, a_i) \cdot Rot(x, \alpha_i) \]

Expanded:

\[ A_i = \begin{bmatrix} \cos\theta_i & -\sin\theta_i\cos\alpha_i & \sin\theta_i\sin\alpha_i & a_i\cos\theta_i \\ \sin\theta_i & \cos\theta_i\cos\alpha_i & -\cos\theta_i\sin\alpha_i & a_i\sin\theta_i \\ 0 & \sin\alpha_i & \cos\alpha_i & d_i \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

Common Coordinate Frames

World Frame

Definition: A fixed global reference frame in the environment
Characteristics: Does not move or rotate with any object
Use cases: Global localization in multi-robot systems, map building
Convention: Typically \(Z\)-axis points up, \(X\)-axis points forward

Base Frame

Definition: A coordinate frame fixed to the robot's base
Characteristics: For fixed-base robots, coincides with or has a fixed offset from the world frame
Use cases: Starting point for kinematics calculations (Frame 0 in DH parameters)

Tool Frame (End-Effector Frame)

Definition: A coordinate frame fixed to the end-effector (e.g., gripper, welding torch)
Characteristics: Moves with the end-effector
Use cases: Defining the Tool Center Point (TCP), force control reference frame
TCP calibration: Determined through multi-point calibration methods to find the tool frame's offset from the flange

Camera Frame

Definition: Origin at the camera's optical center, \(Z\)-axis along the optical axis
Use cases: Visual servoing, eye-hand calibration

Rotation Representations

Rotation Matrix

\(R \in SO(3)\), satisfying \(R^T R = I\) and \(\det(R) = 1\)

Pros: No singularities, convenient for matrix operations
Cons: 9 parameters to represent 3 degrees of freedom (redundant)

Euler Angles

Three angles represent a rotation. Common conventions:

ZYX (Yaw-Pitch-Roll): First rotate about \(Z\) by \(\psi\) (yaw), then about \(Y\) by \(\theta\) (pitch), finally about \(X\) by \(\phi\) (roll)
ZYZ: Commonly used for describing manipulator end-effector orientation

\[ R_{ZYX} = R_z(\psi) R_y(\theta) R_x(\phi) \]

Gimbal Lock

When the pitch angle \(\theta = \pm 90°\), yaw and roll degenerate to the same rotation axis, losing one degree of freedom. This is an inherent limitation of the Euler angle representation.

Quaternion

A quaternion \(\mathbf{q} = w + xi + yj + zk = (w, \mathbf{v})\), where \(|\mathbf{q}| = 1\)

From axis-angle to quaternion (rotation by angle \(\theta\) about unit axis \(\hat{\mathbf{n}}\)):

\[ \mathbf{q} = \left(\cos\frac{\theta}{2}, \; \hat{\mathbf{n}}\sin\frac{\theta}{2}\right) \]

Quaternion rotation: Rotating a point \(\mathbf{p}\): \(\mathbf{p}' = \mathbf{q} \mathbf{p} \mathbf{q}^{-1}\)

Pros:

Only 4 parameters (3 degrees of freedom after normalization)
No gimbal lock
Rotation composition requires only quaternion multiplication
Convenient interpolation (Slerp -- spherical linear interpolation)

Cons:

Less intuitive than rotation matrices
Double cover: \(\mathbf{q}\) and \(-\mathbf{q}\) represent the same rotation

Conversion Summary

From \ To	Rotation Matrix	Euler Angles	Quaternion
Rotation Matrix	--	Extract angles (beware gimbal lock)	Shepperd's method
Euler Angles	Product of three rotation matrices	--	Convert to rotation matrix first
Quaternion	\(R = I + 2w[\mathbf{v}]_\times + 2[\mathbf{v}]_\times^2\)	Convert to rotation matrix first	--

Practical Examples

Example: Forward Kinematics of a 2-DOF Planar Manipulator

Consider a two-link planar manipulator with link lengths \(L_1\) and \(L_2\):

Joint	\(\theta_i\)	\(d_i\)	\(a_i\)	\(\alpha_i\)
1	\(q_1\)	0	\(L_1\)	0
2	\(q_2\)	0	\(L_2\)	0

End-effector position:

\[ x = L_1 \cos q_1 + L_2 \cos(q_1 + q_2) \]

\[ y = L_1 \sin q_1 + L_2 \sin(q_1 + q_2) \]

Example: Coordinate Transformation in RL

In reinforcement learning, converting target positions from the world frame to the body frame is a common operation:

import numpy as np

def world_to_body(target_world, robot_pos, robot_yaw):
    """Convert a target in world frame to body frame."""
    # Compute relative position
    delta = target_world - robot_pos

    # Rotate into body frame
    cos_yaw = np.cos(robot_yaw)
    sin_yaw = np.sin(robot_yaw)

    target_body = np.array([
        cos_yaw * delta[0] + sin_yaw * delta[1],
        -sin_yaw * delta[0] + cos_yaw * delta[1]
    ])
    return target_body

This way the policy network directly receives the target direction in the body frame without needing to learn the trigonometric relationship itself, greatly simplifying the learning problem.