Robot Arms and Mobile Manipulation
Overview
Robot arms (manipulators) are the core form of industrial robots, while mobile manipulation combines mobile bases with robot arms, granting robots the ability to grasp and manipulate objects in open environments. This is the central "hand" problem in embodied intelligence.
Robot Arm Fundamentals
Degrees of Freedom and Joint Types
- Revolute joint: Rotates about a fixed axis, most common
- Prismatic joint: Translates along a straight line
- Degrees of Freedom (DOF): An end-effector has 6 DOF in 3D space (3 translation + 3 rotation), so a 6-DOF arm is fully determined, while 7-DOF has kinematic redundancy
Kinematics
Forward kinematics: Compute end-effector pose \(\mathbf{T}\) from joint angles \(\mathbf{q}\) via chained homogeneous transformation matrices:
Each \(\mathbf{T}_{i-1}^{i}\) is determined by DH (Denavit-Hartenberg) parameters or the Product of Exponentials (PoE) method.
Inverse kinematics: Given a desired end-effector pose \(\mathbf{T}_{desired}\), solve for joint angles \(\mathbf{q}\). Analytical solutions exist only for specific configurations; general methods use numerical iteration:
where \(J^{\dagger}\) is the Moore-Penrose pseudoinverse of the Jacobian. When \(J\) is near singular, use Damped Least Squares:
Dynamics
Robot arm dynamics are described by the Lagrangian equation:
- \(M(\mathbf{q})\): mass matrix (symmetric positive definite)
- \(C(\mathbf{q}, \dot{\mathbf{q}})\): Coriolis and centrifugal force matrix
- \(G(\mathbf{q})\): gravity term
- \(\boldsymbol{\tau}\): joint torques
Computed Torque Control:
where \(\mathbf{e} = \mathbf{q}_d - \mathbf{q}\) is the tracking error.
Workspace and Singularities
- Reachable workspace: Set of all positions the end-effector can reach
- Dexterous workspace: Subset of positions reachable with arbitrary orientation
- Singular configurations: Configurations where the Jacobian loses rank, preventing motion in certain directions
Manipulability measures the dexterity of the arm at its current configuration:
Major Platforms
Research-Grade Robot Arms
| Platform | DOF | Payload | Features | Price Range |
|---|---|---|---|---|
| Franka Emika Panda | 7 | 3 kg | Torque sensors in all joints, impedance control | ~$30K |
| Kinova Gen3 | 7 | 4 kg | Lightweight, ROS2 support, force feedback | ~$25K |
| UR5e/UR10e | 6 | 5/12.5 kg | Collaborative robot pioneer, 6-axis F/T sensor | ~$35-50K |
| xArm 7 | 7 | 3.5 kg | Chinese-made high value, open-source SDK | ~$8-10K |
| UFACTORY Lite 6 | 6 | 2 kg | Ultra-low-price research arm | ~$2K |
| Koch v1.1 | 6 | - | Open-source low-cost, LeRobot community | ~$300 |
Franka Emika Panda Details
Franka Panda is the most widely used platform in robot manipulation research:
- Joint torque sensors: All 7 joints have built-in high-precision torque sensors
- Impedance control: Supports Cartesian and joint-space impedance control
- libfranka: 1kHz real-time control interface
- franka_ros2: Official ROS2 integration
- Applications: Widely used in grasping, manipulation, and contact-rich task research
Mobile Manipulation
Why Mobile Manipulation Is Needed
Fixed-base robot arms have limited workspace, yet many real tasks require robots to move through environments while manipulating objects:
- Household tidying (retrieving and placing items from different rooms)
- Warehouse logistics (moving to shelves for picking)
- Inspection and maintenance (moving to equipment for operations)
System Architecture
graph TB
subgraph Perception_Layer["Perception Layer"]
CAM[RGB-D Camera] --> DET[Object Detection/Segmentation]
LID[LiDAR] --> MAP[Mapping/Localization]
FT[Force/Torque Sensor] --> CONT[Contact Detection]
end
subgraph Planning_Layer["Planning Layer"]
DET --> GRASP[Grasp Planning]
MAP --> NAV[Navigation Planning]
GRASP --> WBC[Whole-Body Planning]
NAV --> WBC
end
subgraph Control_Layer["Control Layer"]
WBC --> BASE[Base Control]
WBC --> ARM[Arm Control]
CONT --> ARM
BASE --> MOT_B[Base Motors]
ARM --> MOT_A[Arm Joint Motors]
end
subgraph Hardware
MOT_B --> ROBOT[Mobile Manipulation Robot]
MOT_A --> ROBOT
ROBOT --> CAM
ROBOT --> LID
ROBOT --> FT
end
Whole-Body Planning and Control
The core challenge of mobile manipulation is coordinating base motion with arm motion.
Approach 1: Hierarchical planning 1. Plan base to reach manipulation position first 2. After base settles, plan arm motion 3. Simple but inefficient, not suitable for dynamic tasks
Approach 2: Whole-body motion planning
Unify base DOF (\(x, y, \theta\)) with arm DOF (\(q_1, ..., q_n\)) into a high-dimensional configuration space:
Use sampling-based planners like RRT/PRM in this space for joint planning.
Approach 3: Optimization methods
Use trajectory optimization (e.g., TrajOpt, CHOMP) to simultaneously optimize base and arm motion:
Representative Mobile Manipulation Platforms
| Platform | Composition | Features | Application |
|---|---|---|---|
| Hello Robot Stretch | Differential base + telescoping arm | Lightweight, ~$25K, clean design | Home assistance research |
| Fetch Mobile Manipulator | Differential base + 7-DOF arm | Classic research platform | Discontinued, extensive prior work |
| Mobile ALOHA | AgileX base + dual ViperX arms | Low-cost dual-arm teleop, open-source | Imitation learning, household |
| Google Everyday Robots | Mobile base + 7-DOF arm | Internal R&D, RT-1/RT-2 | Office cleaning |
| TIAGo (PAL Robotics) | Differential base + 7-DOF arm | Commercial research platform, ROS integration | Service/research |
| PR2 (Willow Garage) | Omnidirectional base + dual 7-DOF arms | Historical classic, ROS origin platform | Discontinued |
Grasping
Grasping Problem Classification
graph TD
A[Robot Grasping] --> B[Analytical Methods]
A --> C[Learning-based Methods]
B --> B1[Force Closure Analysis]
B --> B2[Form Closure Analysis]
B --> B3[Grasp Quality Metrics]
C --> C1[Image-based<br/>GG-CNN, GraspNet]
C --> C2[Point Cloud-based<br/>Contact-GraspNet, AnyGrasp]
C --> C3[Diffusion Model-based<br/>Diffusion Policy]
C --> C4[Language-guided<br/>VLM + Grasping]
B1 --> D[Known Object Model]
C1 --> E[Unknown Object Generalization]
C2 --> E
Force Closure and Grasp Quality
Force closure: The friction cone combinations at grasp contact points can resist any external disturbance force.
Given contact force \(\mathbf{f}_i\) at contact point \(i\), the friction cone constraint is:
Mapping contact forces to the object frame's wrench space:
where \([p_i]_\times\) is the skew-symmetric matrix of the contact point position vector.
Grasp quality metric: The positive linear combinations of all contact wrenches form the feasible wrench set \(\mathcal{W}\), with quality:
i.e., the minimum distance from the origin to the boundary of the feasible wrench space. \(Q > 0\) indicates force closure; larger \(Q\) means more robust grasps.
Learning-based Grasping
GraspNet / AnyGrasp: - Input: Single/multi-frame point clouds - Output: Large number of candidate grasp poses (\(SE(3)\)) with quality scores - Training data: Large-scale synthetic data + analytical grasp annotations - Feature: Strong generalization to unseen objects
Contact-GraspNet: - Direct contact grasp prediction on point clouds - 6-DOF grasp pose generation - Fast, suitable for real-time applications
Grasping Pipeline
Typical robot grasping workflow:
- Perception: RGB-D to obtain scene point cloud
- Segmentation: Instance segmentation to isolate target object
- Grasp detection: Generate candidate grasp poses
- Motion planning: Plan collision-free path to grasp pose
- Execution: Execute grasp and verify
Impedance Control and Force Control
When robot arms interact with the environment, pure position control can produce excessive contact forces. Impedance control models the arm end-effector as a spring-damper system:
- \(M_d, D_d, K_d\): desired inertia, damping, stiffness matrices
- \(\mathbf{e} = \mathbf{x} - \mathbf{x}_d\): position error
- \(\mathbf{f}_{ext}\): external force
Advantage: Behavior can be switched from rigid (high \(K_d\)) to compliant (low \(K_d\)) by adjusting stiffness.
Application scenarios: Wiping surfaces, connector insertion/extraction, collaborative carrying, and other force-controlled tasks.
Frontier Directions
Foundation Model-Driven Manipulation
- RT-1 / RT-2 (Google): Robot manipulation policies trained on large-scale data
- Octo (UC Berkeley): Open-source general manipulation foundation model
- OpenVLA: Vision-language-action model, generating actions directly from language instructions
- Diffusion Policy: Diffusion models for action generation, handling multi-modal action distributions
Teleoperation and Data Collection
- ALOHA / Mobile ALOHA: Low-cost dual-arm teleoperation systems using follower arms for direct teleoperation
- UMI (Universal Manipulation Interface): Hand-held gripper for data collection, no robot needed for demonstrations
- Open-TeleVision: VR headset teleoperation, supporting dexterous hands
References
- Siciliano et al., Robotics: Modelling, Planning and Control, Springer
- Lynch & Park, Modern Robotics: Mechanics, Planning, and Control
- Fang et al., "AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains", T-RO, 2023
- Chi et al., "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", RSS, 2023
Related Notes: