Robot Arms and Mobile Manipulation

Overview

Robot arms (manipulators) are the core form of industrial robots, while mobile manipulation combines mobile bases with robot arms, granting robots the ability to grasp and manipulate objects in open environments. This is the central "hand" problem in embodied intelligence.

Robot Arm Fundamentals

Degrees of Freedom and Joint Types

Revolute joint: Rotates about a fixed axis, most common
Prismatic joint: Translates along a straight line
Degrees of Freedom (DOF): An end-effector has 6 DOF in 3D space (3 translation + 3 rotation), so a 6-DOF arm is fully determined, while 7-DOF has kinematic redundancy

Kinematics

Forward kinematics: Compute end-effector pose $\mathbf{T}$ from joint angles $\mathbf{q}$ via chained homogeneous transformation matrices:

\[ \mathbf{T}_{0}^{n} = \prod_{i=1}^{n} \mathbf{T}_{i-1}^{i}(q_i) \]

Each $\mathbf{T}_{i-1}^{i}$ is determined by DH (Denavit-Hartenberg) parameters or the Product of Exponentials (PoE) method.

Inverse kinematics: Given a desired end-effector pose $\mathbf{T}_{desired}$, solve for joint angles $\mathbf{q}$. Analytical solutions exist only for specific configurations; general methods use numerical iteration:

\[ \Delta \mathbf{q} = J^{\dagger}(\mathbf{q}) \cdot \Delta \mathbf{x} \]

where $J^{\dagger}$ is the Moore-Penrose pseudoinverse of the Jacobian. When $J$ is near singular, use Damped Least Squares:

\[ \Delta \mathbf{q} = J^T(JJ^T + \lambda^2 I)^{-1} \Delta \mathbf{x} \]

Dynamics

Robot arm dynamics are described by the Lagrangian equation:

\[ M(\mathbf{q})\ddot{\mathbf{q}} + C(\mathbf{q}, \dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) = \boldsymbol{\tau} \]

$M(\mathbf{q})$: mass matrix (symmetric positive definite)
$C(\mathbf{q}, \dot{\mathbf{q}})$: Coriolis and centrifugal force matrix
$G(\mathbf{q})$: gravity term
$\boldsymbol{\tau}$: joint torques

Computed Torque Control:

\[ \boldsymbol{\tau} = M(\mathbf{q})(\ddot{\mathbf{q}}_d + K_d \dot{\mathbf{e}} + K_p \mathbf{e}) + C(\mathbf{q}, \dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) \]

where $\mathbf{e} = \mathbf{q}_d - \mathbf{q}$ is the tracking error.

Workspace and Singularities

Reachable workspace: Set of all positions the end-effector can reach
Dexterous workspace: Subset of positions reachable with arbitrary orientation
Singular configurations: Configurations where the Jacobian loses rank, preventing motion in certain directions

Manipulability measures the dexterity of the arm at its current configuration:

\[ w(\mathbf{q}) = \sqrt{\det(J(\mathbf{q})J(\mathbf{q})^T)} \]

Major Platforms

Research-Grade Robot Arms

Platform	DOF	Payload	Features	Price Range
Franka Emika Panda	7	3 kg	Torque sensors in all joints, impedance control	~$30K
Kinova Gen3	7	4 kg	Lightweight, ROS2 support, force feedback	~$25K
UR5e/UR10e	6	5/12.5 kg	Collaborative robot pioneer, 6-axis F/T sensor	~$35-50K
xArm 7	7	3.5 kg	Chinese-made high value, open-source SDK	~$8-10K
UFACTORY Lite 6	6	2 kg	Ultra-low-price research arm	~$2K
Koch v1.1	6	-	Open-source low-cost, LeRobot community	~$300

Franka Emika Panda Details

Franka Panda is the most widely used platform in robot manipulation research:

Joint torque sensors: All 7 joints have built-in high-precision torque sensors
Impedance control: Supports Cartesian and joint-space impedance control
libfranka: 1kHz real-time control interface
franka_ros2: Official ROS2 integration
Applications: Widely used in grasping, manipulation, and contact-rich task research

Mobile Manipulation

Why Mobile Manipulation Is Needed

Fixed-base robot arms have limited workspace, yet many real tasks require robots to move through environments while manipulating objects:

Household tidying (retrieving and placing items from different rooms)
Warehouse logistics (moving to shelves for picking)
Inspection and maintenance (moving to equipment for operations)

System Architecture

graph TB
    subgraph Perception_Layer["Perception Layer"]
        CAM[RGB-D Camera] --> DET[Object Detection/Segmentation]
        LID[LiDAR] --> MAP[Mapping/Localization]
        FT[Force/Torque Sensor] --> CONT[Contact Detection]
    end

    subgraph Planning_Layer["Planning Layer"]
        DET --> GRASP[Grasp Planning]
        MAP --> NAV[Navigation Planning]
        GRASP --> WBC[Whole-Body Planning]
        NAV --> WBC
    end

    subgraph Control_Layer["Control Layer"]
        WBC --> BASE[Base Control]
        WBC --> ARM[Arm Control]
        CONT --> ARM
        BASE --> MOT_B[Base Motors]
        ARM --> MOT_A[Arm Joint Motors]
    end

    subgraph Hardware
        MOT_B --> ROBOT[Mobile Manipulation Robot]
        MOT_A --> ROBOT
        ROBOT --> CAM
        ROBOT --> LID
        ROBOT --> FT
    end

Whole-Body Planning and Control

The core challenge of mobile manipulation is coordinating base motion with arm motion.

Approach 1: Hierarchical planning 1. Plan base to reach manipulation position first 2. After base settles, plan arm motion 3. Simple but inefficient, not suitable for dynamic tasks

Approach 2: Whole-body motion planning

Unify base DOF ($x, y, \theta$) with arm DOF ($q_1, ..., q_n$) into a high-dimensional configuration space:

\[ \mathbf{q}_{full} = [x, y, \theta, q_1, q_2, ..., q_n]^T \]

Use sampling-based planners like RRT/PRM in this space for joint planning.

Approach 3: Optimization methods

Use trajectory optimization (e.g., TrajOpt, CHOMP) to simultaneously optimize base and arm motion:

\[ \min_{\mathbf{q}_{0:T}} \sum_{t=0}^{T} \left[ c_{task}(\mathbf{q}_t) + c_{smooth}(\mathbf{q}_t, \mathbf{q}_{t-1}) + c_{collision}(\mathbf{q}_t) \right] \]

Representative Mobile Manipulation Platforms

Platform	Composition	Features	Application
Hello Robot Stretch	Differential base + telescoping arm	Lightweight, ~$25K, clean design	Home assistance research
Fetch Mobile Manipulator	Differential base + 7-DOF arm	Classic research platform	Discontinued, extensive prior work
Mobile ALOHA	AgileX base + dual ViperX arms	Low-cost dual-arm teleop, open-source	Imitation learning, household
Google Everyday Robots	Mobile base + 7-DOF arm	Internal R&D, RT-1/RT-2	Office cleaning
TIAGo (PAL Robotics)	Differential base + 7-DOF arm	Commercial research platform, ROS integration	Service/research
PR2 (Willow Garage)	Omnidirectional base + dual 7-DOF arms	Historical classic, ROS origin platform	Discontinued

Grasping

Grasping Problem Classification

graph TD
    A[Robot Grasping] --> B[Analytical Methods]
    A --> C[Learning-based Methods]

    B --> B1[Force Closure Analysis]
    B --> B2[Form Closure Analysis]
    B --> B3[Grasp Quality Metrics]

    C --> C1[Image-based<br/>GG-CNN, GraspNet]
    C --> C2[Point Cloud-based<br/>Contact-GraspNet, AnyGrasp]
    C --> C3[Diffusion Model-based<br/>Diffusion Policy]
    C --> C4[Language-guided<br/>VLM + Grasping]

    B1 --> D[Known Object Model]
    C1 --> E[Unknown Object Generalization]
    C2 --> E

Force Closure and Grasp Quality

Force closure: The friction cone combinations at grasp contact points can resist any external disturbance force.

Given contact force $\mathbf{f}_i$ at contact point $i$, the friction cone constraint is:

\[ \sqrt{f_{ix}^2 + f_{iy}^2} \leq \mu f_{iz}, \quad f_{iz} \geq 0 \]

Mapping contact forces to the object frame's wrench space:

\[ \mathbf{w}_i = G_i \mathbf{f}_i, \quad G_i = \begin{bmatrix} I \\ [p_i]_\times \end{bmatrix} \]

where $[p_i]_\times$ is the skew-symmetric matrix of the contact point position vector.

Grasp quality metric: The positive linear combinations of all contact wrenches form the feasible wrench set $\mathcal{W}$, with quality:

\[ Q = \min_{\mathbf{w} \in \partial \mathcal{W}} \|\mathbf{w}\| \]

i.e., the minimum distance from the origin to the boundary of the feasible wrench space. $Q > 0$ indicates force closure; larger $Q$ means more robust grasps.

Learning-based Grasping

GraspNet / AnyGrasp: - Input: Single/multi-frame point clouds - Output: Large number of candidate grasp poses ($SE(3)$) with quality scores - Training data: Large-scale synthetic data + analytical grasp annotations - Feature: Strong generalization to unseen objects

Contact-GraspNet: - Direct contact grasp prediction on point clouds - 6-DOF grasp pose generation - Fast, suitable for real-time applications

Grasping Pipeline

Typical robot grasping workflow:

Perception: RGB-D to obtain scene point cloud
Segmentation: Instance segmentation to isolate target object
Grasp detection: Generate candidate grasp poses
Motion planning: Plan collision-free path to grasp pose
Execution: Execute grasp and verify

Impedance Control and Force Control

When robot arms interact with the environment, pure position control can produce excessive contact forces. Impedance control models the arm end-effector as a spring-damper system:

\[ M_d \ddot{\mathbf{e}} + D_d \dot{\mathbf{e}} + K_d \mathbf{e} = \mathbf{f}_{ext} \]

$M_d, D_d, K_d$: desired inertia, damping, stiffness matrices
$\mathbf{e} = \mathbf{x} - \mathbf{x}_d$: position error
$\mathbf{f}_{ext}$: external force

Advantage: Behavior can be switched from rigid (high $K_d$) to compliant (low $K_d$) by adjusting stiffness.

Application scenarios: Wiping surfaces, connector insertion/extraction, collaborative carrying, and other force-controlled tasks.

Frontier Directions

Foundation Model-Driven Manipulation

RT-1 / RT-2 (Google): Robot manipulation policies trained on large-scale data
Octo (UC Berkeley): Open-source general manipulation foundation model
OpenVLA: Vision-language-action model, generating actions directly from language instructions
Diffusion Policy: Diffusion models for action generation, handling multi-modal action distributions

Teleoperation and Data Collection

ALOHA / Mobile ALOHA: Low-cost dual-arm teleoperation systems using follower arms for direct teleoperation
UMI (Universal Manipulation Interface): Hand-held gripper for data collection, no robot needed for demonstrations
Open-TeleVision: VR headset teleoperation, supporting dexterous hands

References

Siciliano et al., Robotics: Modelling, Planning and Control, Springer
Lynch & Park, Modern Robotics: Mechanics, Planning, and Control
Fang et al., "AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains", T-RO, 2023
Chi et al., "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", RSS, 2023

Related Notes: