Skip to content

Computing Platforms

Robot systems span multiple computational tiers: from microsecond-level real-time control to second-level AI inference. This article surveys embedded AI computing platforms, edge inference accelerators, cloud training resources, and heterogeneous computing architectures.


Computational Requirements by Tier

graph LR
    subgraph RT["Real-time Control Layer"]
        MCU[MCU / FPGA<br/>1kHz - 10kHz<br/>Motor control, sensor readout]
    end

    subgraph Edge["Edge Inference Layer"]
        JETSON[Jetson / Edge AI<br/>30-100Hz<br/>Perception, planning, inference]
    end

    subgraph Cloud["Cloud Training Layer"]
        GPU[GPU Cluster<br/>A100 / H100<br/>Model training, large-scale simulation]
    end

    MCU -- "EtherCAT / CAN<br/>< 1ms" --> JETSON
    JETSON -- "WiFi / 5G<br/>10-100ms" --> GPU

    style RT fill:#ffebee
    style Edge fill:#e8f5e9
    style Cloud fill:#e3f2fd
Tier Latency Requirement Computation Type Typical Hardware
Real-time control <1ms (>1kHz) Fixed algorithms, PID, state machines MCU (STM32), FPGA
Perception inference 10-100ms (10-100Hz) Neural network inference, SLAM Jetson, edge AI
High-level planning 100ms-1s Motion planning, task planning Jetson AGX, industrial PC
Model training Hours to days Large-scale RL/VLA training GPU cluster

NVIDIA Jetson Series

Jetson is the de facto standard for robot AI computing. The current main product line is the Orin series.

Orin Series Comparison

Model GPU CPU AI Performance Memory Storage Interface Power Ref. Price
Orin Nano 4GB 512 CUDA 6-core A78AE 20 TOPS 4GB LPDDR5 NVMe 7-15W ~$199
Orin Nano 8GB 1024 CUDA 6-core A78AE 40 TOPS 8GB LPDDR5 NVMe 7-15W ~$299
Orin NX 8GB 1024 CUDA 6-core A78AE 70 TOPS 8GB LPDDR5 NVMe 10-25W ~$399
Orin NX 16GB 1024 CUDA 8-core A78AE 100 TOPS 16GB LPDDR5 NVMe 10-25W ~$599
AGX Orin 32GB 1792 CUDA 8-core A78AE + 4-core A78 200 TOPS 32GB LPDDR5 NVMe 15-50W ~$999
AGX Orin 64GB 2048 CUDA 12-core A78AE 275 TOPS 64GB LPDDR5 NVMe 15-60W ~$1,599

Jetson Nano (Legacy)

The original Jetson Nano (128 CUDA Maxwell, 472 GFLOPS FP16, 4GB LPDDR4, 5-10W, ~$149) is gradually being replaced by the Orin Nano, but remains widely used in educational settings.

JetPack SDK

JetPack is the complete SDK for Jetson, including:

Component Description
L4T Linux for Tegra (Ubuntu-based)
CUDA GPU computing
cuDNN Deep learning acceleration
TensorRT Inference optimization engine (FP16/INT8 quantization, layer fusion)
VPI Vision Programming Interface
Multimedia API Hardware codec
DeepStream Video analytics pipeline
Isaac ROS Robot-specific ROS 2 acceleration packages

Version Mapping

JetPack L4T CUDA Supported Hardware
5.1.x R35.x 11.4 All Orin series
6.0+ R36.x 12.2+ All Orin series

Deployment Optimization

# Set high-performance mode
sudo nvpmodel -m 0    # MAXN mode (maximum performance)
sudo jetson_clocks     # Lock maximum frequency

# TensorRT model optimization
trtexec --onnx=model.onnx \
        --saveEngine=model.engine \
        --fp16 \                    # FP16 quantization
        --workspace=4096             # 4GB workspace
# Python TensorRT inference example
import tensorrt as trt
import pycuda.driver as cuda

# Load engine
runtime = trt.Runtime(trt.Logger())
with open("model.engine", "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

context = engine.create_execution_context()
# Bind inputs/outputs, allocate memory, execute inference
# Typical latency: ~5ms on Orin NX (INT8), ~20ms on Orin Nano (FP16)

Jetson Selection Recommendations

Application Recommended Model Rationale
Education/entry-level Orin Nano 8GB Low cost, sufficient
Service robot (navigation+avoidance) Orin NX 16GB Balanced performance and power
Robot arm manipulation (VLA inference) AGX Orin 32GB Large models require large memory
Humanoid robot AGX Orin 64GB Multi-modal perception + whole-body control
Autonomous driving prototype AGX Orin 64GB Multi-sensor fusion

Next Generation: Jetson Thor

NVIDIA's announced next-generation robot computing platform based on the Blackwell GPU architecture:

Metric AGX Orin Thor (Expected)
AI Performance 275 TOPS 800+ TOPS
Memory 64GB 128GB
GPU Architecture Ampere Blackwell
Target Application General robotics Humanoid robot foundation models

Other Edge AI Platforms

Intel Movidius (Integrated into OpenVINO)

Feature Description
Chip Myriad X VPU
Performance ~4 TOPS
Power ~1W
Features Ultra-low power, USB accelerator stick form factor
SDK Intel OpenVINO
Status Standalone chips no longer produced, integrated into Intel platforms

Google Coral TPU

Feature Description
Chip Edge TPU
Performance 4 TOPS (INT8)
Power ~2W
Form Factor USB accelerator / Dev Board / M.2 module
SDK TensorFlow Lite
Features INT8-dedicated, extremely low inference latency

Comparison

Platform Performance Power Ecosystem Flexibility Price
Jetson Orin Nano 40 TOPS 15W CUDA/TensorRT Very high $299
Coral TPU 4 TOPS 2W TF Lite Low $60
OpenVINO (Intel) ~5 TOPS 5W OpenVINO Medium $80
Hailo-8 26 TOPS 3W Hailo SDK Medium ~$100
Rockchip RK3588 6 TOPS 5-10W RKNN Medium ~$100

Onboard vs Cloud Computing

In robot systems, different tasks have different latency requirements, necessitating proper partitioning of local and cloud computation.

Latency Requirements and Compute Location

Control Tier Frequency Req. Latency Tolerance Compute Location Example
Low-level motor control 1-10 kHz <1 ms Local (FPGA/MCU) PID torque control
Mid-level motion control 100-500 Hz 2-10 ms Local (Jetson/MCU) Trajectory tracking
High-level policy inference 10-50 Hz 20-100 ms Local (Jetson) Visual policy inference
Language understanding/planning 0.1-1 Hz 100ms-seconds Cloud/local both viable VLM task planning
Training/fine-tuning Offline Unconstrained Cloud Policy model training

Key Principle: 1kHz-level control loops must run locally and never depend on network; 10Hz-level high-level planning can consider cloud assistance but needs a local fallback mechanism.


Cloud / Workstation GPUs

Training Card Comparison

GPU VRAM FP16 TFLOPS Interconnect Price Typical Use
RTX 4090 24GB 330 PCIe ~$1,600 Personal research
A100 80GB 80GB 312 NVLink ~$15,000 Lab training
H100 80GB 80GB 990 NVLink/NVSwitch ~$30,000 Large-scale training
H200 141GB HBM3e 990 NVLink ~$35,000 VLA large models

GPU Requirements for Robot AI

Task Model Scale Minimum GPU Recommended GPU
Small RL policy training <10M params RTX 3060 RTX 4090
Isaac Lab parallel training RTX 3080 A100
VLA fine-tuning (7B) 7B params A100 40GB 2x A100 80GB
VLA pretraining 7B+ params 8x A100 8x H100
Real-time VLA inference 3B params Jetson AGX Orin
Real-time small model inference <100M params Jetson Orin NX

Inference Optimization Pipeline

Common optimization techniques for deploying models at the edge:

  1. Quantization: FP32 -> FP16 -> INT8, reducing computational requirements 2-8x
  2. Distillation: Transfer large model knowledge to smaller models
  3. Pruning: Remove redundant weights
  4. TensorRT optimization: Layer fusion, memory optimization, 2-5x inference speedup
Training (A100/H100, FP32)
    | Export ONNX
    | TensorRT conversion (FP16/INT8)
    | Deploy to Jetson
Inference (Jetson Orin, INT8): 50ms -> 8ms, 400MB -> 100MB

FPGA in Robotics

FPGAs are used for real-time control scenarios requiring microsecond-level deterministic latency.

Typical Applications

Application Description Latency Requirement
Motor FOC control Field-oriented control, PWM generation <10us
EtherCAT master Real-time industrial communication <1ms
Sensor preprocessing Encoder counting, ADC sampling <1us
Safety monitoring Force/position limits, emergency stop <10us

Common FPGA Platforms

Platform Chip Features Price Application
Xilinx Zynq-7000 ARM + FPGA SoC, embedded + logic ~$200 Motor control
Intel Cyclone V ARM + FPGA Low-cost SoC ~$150 Education/prototype
Xilinx Kria KV260 Zynq UltraScale+ Vision AI + real-time control ~$250 Robot vision
Lattice iCE40 Ultra-low power, open-source toolchain ~$50 Simple control logic

FPGA vs MCU Comparison

Feature FPGA MCU (STM32, etc.)
Latency <1 us 1-100 us
Parallelism True hardware parallelism Pseudo-parallelism (interrupts)
Development difficulty High (HDL/Verilog) Low (C/C++)
Flexibility Hardware reconfigurable Fixed architecture
Cost Higher Low
Typical scenario Multi-axis synchronous control Single-axis PID control

Power Budget for Mobile Robots

Mobile robots have limited battery capacity; computing platform power directly affects battery life.

Typical Power Distribution (Mobile Manipulation Robot)

Subsystem Power Share Typical Power
Mobile base motors 40-50% 50-200 W
Robot arm motors 20-30% 20-100 W
Computing platform 10-20% 10-60 W
Sensors 5-10% 5-20 W
Communication 2-5% 2-10 W

Power Optimization Strategies

  1. Dynamic frequency scaling: Lower GPU/CPU frequency when idle (nvpmodel to switch power modes)
  2. On-demand model loading: Switch to lightweight models when complex inference is unnecessary
  3. Sensor sleep: Non-essential sensors can sample intermittently
  4. Mixed-precision inference: Use lower precision for non-critical tasks (INT8 reduces power ~40% vs FP16)

Heterogeneous Computing Architecture

Real robot systems typically employ heterogeneous computing architectures with multi-tier hardware collaboration:

Tier Hardware Communication Function
Level 0 MCU (STM32H7) CAN/SPI Motor control (10kHz)
Level 1 FPGA (optional) EtherCAT Real-time safety monitoring
Level 2 Jetson (Orin) Ethernet/USB3 AI inference + ROS2
Level 3 Cloud GPU WiFi/5G Training, remote monitoring
+------------------------------------------+
|              Cloud (GPU Cluster)          |
|         Train VLA / Large-scale sim      |
+-----------------+------------------------+
                  | WiFi / 5G
+-----------------+------------------------+
|          Jetson AGX Orin (ROS2)          |
|    Perception | SLAM | Planning | VLA    |
+----+----------+----------+---------------+
     | USB3     | Ethernet | EtherCAT
+----+----+ +---+---+ +---+--------------+
| Camera  | | LiDAR | | MCU (STM32)      |
| D435i   | | Mid360| | Motor FOC 10kHz  |
+---------+ +-------+ | Encoder readout  |
                      | Safety limits    |
                      +------------------+

Selection Decision Process

  1. Determine inference model size: Parameter count determines minimum memory requirements
  2. Determine inference frequency: Control >100Hz requires high compute; perception at 30Hz is less demanding
  3. Power budget: Mobile robots are strictly constrained; fixed installations are more relaxed
  4. ROS2 support: Jetson ecosystem is most comprehensive
  5. Cost constraints: Orin Nano for education, AGX Orin for research

For a more detailed selection framework, see Hardware Selection Guide.



评论 #