Coordinate Frames
Body Frame
The body frame is a coordinate system centered on the robot itself, moving and rotating together with it.
Imagine you are standing in a room:
- World frame: The cardinal directions of the room — no matter how you turn your head, "north" is always north.
- Body frame: "Forward" is always the direction you are facing, and "left" is always your left-hand side.
World Frame (fixed) Body Frame (rotates with the robot)
Y(北) Y(左)
^ ^
| |
| 机器人头 --> X(前)
+-----> X(东)
Suppose the exit is located 2 meters to the robot's "front-left":
| Representation | World Frame | Body Frame |
|---|---|---|
| Robot facing north | exit_dir = (−1, 1) | exit_dir = (1, 1) (front-left) |
| Robot facing south (180° turn) | exit_dir = (−1, 1) (unchanged!) | exit_dir = (−1, −1) (rear-right) |
| What the policy must learn | Same exit_dir, but must turn left when facing north and right when facing south → must combine yaw to decide | Directly tells the policy "exit is front-left" → just turn left |
In the world frame, the policy network must learn on its own how to "read the yaw angle + read the exit's world-frame direction -> figure out which way to turn" -- an implicit trigonometric relationship. This is extremely difficult for a small [128, 64] network.
Homogeneous Transformation Matrix
A homogeneous transformation matrix is a \(4 \times 4\) matrix that simultaneously represents rotation and translation:
Where:
- \(R \in SO(3)\): A \(3 \times 3\) rotation matrix
- \(\mathbf{t} \in \mathbb{R}^3\): Translation vector
Key advantage: Unifies rotation and translation into matrix multiplication; multiple transformations are simply chained:
Inverse transformation:
DH Parameters (Denavit-Hartenberg Convention)
DH parameters provide the standard method for establishing link coordinate frames in robot forward kinematics.
The Four Parameters
| Parameter | Symbol | Definition |
|---|---|---|
| Joint angle | \(\theta_i\) | Rotation about \(z_{i-1}\), from \(x_{i-1}\) to \(x_i\) |
| Link offset | \(d_i\) | Translation along \(z_{i-1}\), from \(x_{i-1}\) to \(x_i\) |
| Link length | \(a_i\) | Translation along \(x_i\), from \(z_{i-1}\) to \(z_i\) |
| Link twist | \(\alpha_i\) | Rotation about \(x_i\), from \(z_{i-1}\) to \(z_i\) |
For revolute joints, \(\theta_i\) is the variable; for prismatic joints, \(d_i\) is the variable.
Setup Procedure
- Number the links: Starting from the base (link 0) to the end-effector
- Define \(z\) axes: \(z_i\) points along the axis of joint \(i+1\)
- Define origins: \(O_i\) is at the intersection of the common normal between \(z_{i-1}\) and \(z_i\) with \(z_i\)
- Define \(x\) axes: \(x_i\) points along the common normal from \(z_{i-1}\) to \(z_i\)
- Define \(y\) axes: Determined by the right-hand rule
- Read parameters: Determine \(\theta_i, d_i, a_i, \alpha_i\) from the geometric relationships between frames
Single-Joint Transformation Matrix
Expanded:
Common Coordinate Frames
World Frame
- Definition: A fixed global reference frame in the environment
- Characteristics: Does not move or rotate with any object
- Use cases: Global localization in multi-robot systems, map building
- Convention: Typically \(Z\)-axis points up, \(X\)-axis points forward
Base Frame
- Definition: A coordinate frame fixed to the robot's base
- Characteristics: For fixed-base robots, coincides with or has a fixed offset from the world frame
- Use cases: Starting point for kinematics calculations (Frame 0 in DH parameters)
Tool Frame (End-Effector Frame)
- Definition: A coordinate frame fixed to the end-effector (e.g., gripper, welding torch)
- Characteristics: Moves with the end-effector
- Use cases: Defining the Tool Center Point (TCP), force control reference frame
- TCP calibration: Determined through multi-point calibration methods to find the tool frame's offset from the flange
Camera Frame
- Definition: Origin at the camera's optical center, \(Z\)-axis along the optical axis
- Use cases: Visual servoing, eye-hand calibration
Rotation Representations
Rotation Matrix
\(R \in SO(3)\), satisfying \(R^T R = I\) and \(\det(R) = 1\)
- Pros: No singularities, convenient for matrix operations
- Cons: 9 parameters to represent 3 degrees of freedom (redundant)
Euler Angles
Three angles represent a rotation. Common conventions:
- ZYX (Yaw-Pitch-Roll): First rotate about \(Z\) by \(\psi\) (yaw), then about \(Y\) by \(\theta\) (pitch), finally about \(X\) by \(\phi\) (roll)
- ZYZ: Commonly used for describing manipulator end-effector orientation
Gimbal Lock
When the pitch angle \(\theta = \pm 90°\), yaw and roll degenerate to the same rotation axis, losing one degree of freedom. This is an inherent limitation of the Euler angle representation.
Quaternion
A quaternion \(\mathbf{q} = w + xi + yj + zk = (w, \mathbf{v})\), where \(|\mathbf{q}| = 1\)
From axis-angle to quaternion (rotation by angle \(\theta\) about unit axis \(\hat{\mathbf{n}}\)):
Quaternion rotation: Rotating a point \(\mathbf{p}\): \(\mathbf{p}' = \mathbf{q} \mathbf{p} \mathbf{q}^{-1}\)
Pros:
- Only 4 parameters (3 degrees of freedom after normalization)
- No gimbal lock
- Rotation composition requires only quaternion multiplication
- Convenient interpolation (Slerp -- spherical linear interpolation)
Cons:
- Less intuitive than rotation matrices
- Double cover: \(\mathbf{q}\) and \(-\mathbf{q}\) represent the same rotation
Conversion Summary
| From \ To | Rotation Matrix | Euler Angles | Quaternion |
|---|---|---|---|
| Rotation Matrix | -- | Extract angles (beware gimbal lock) | Shepperd's method |
| Euler Angles | Product of three rotation matrices | -- | Convert to rotation matrix first |
| Quaternion | \(R = I + 2w[\mathbf{v}]_\times + 2[\mathbf{v}]_\times^2\) | Convert to rotation matrix first | -- |
Practical Examples
Example: Forward Kinematics of a 2-DOF Planar Manipulator
Consider a two-link planar manipulator with link lengths \(L_1\) and \(L_2\):
| Joint | \(\theta_i\) | \(d_i\) | \(a_i\) | \(\alpha_i\) |
|---|---|---|---|---|
| 1 | \(q_1\) | 0 | \(L_1\) | 0 |
| 2 | \(q_2\) | 0 | \(L_2\) | 0 |
End-effector position:
Example: Coordinate Transformation in RL
In reinforcement learning, converting target positions from the world frame to the body frame is a common operation:
import numpy as np
def world_to_body(target_world, robot_pos, robot_yaw):
"""Convert a target in world frame to body frame."""
# Compute relative position
delta = target_world - robot_pos
# Rotate into body frame
cos_yaw = np.cos(robot_yaw)
sin_yaw = np.sin(robot_yaw)
target_body = np.array([
cos_yaw * delta[0] + sin_yaw * delta[1],
-sin_yaw * delta[0] + cos_yaw * delta[1]
])
return target_body
This way the policy network directly receives the target direction in the body frame without needing to learn the trigonometric relationship itself, greatly simplifying the learning problem.