Computing Platforms

Robot systems span multiple computational tiers: from microsecond-level real-time control to second-level AI inference. This article surveys embedded AI computing platforms, edge inference accelerators, cloud training resources, and heterogeneous computing architectures.

Computational Requirements by Tier

graph LR
    subgraph RT["Real-time Control Layer"]
        MCU[MCU / FPGA<br/>1kHz - 10kHz<br/>Motor control, sensor readout]
    end

    subgraph Edge["Edge Inference Layer"]
        JETSON[Jetson / Edge AI<br/>30-100Hz<br/>Perception, planning, inference]
    end

    subgraph Cloud["Cloud Training Layer"]
        GPU[GPU Cluster<br/>A100 / H100<br/>Model training, large-scale simulation]
    end

    MCU -- "EtherCAT / CAN<br/>< 1ms" --> JETSON
    JETSON -- "WiFi / 5G<br/>10-100ms" --> GPU

    style RT fill:#ffebee
    style Edge fill:#e8f5e9
    style Cloud fill:#e3f2fd

Tier	Latency Requirement	Computation Type	Typical Hardware
Real-time control	<1ms (>1kHz)	Fixed algorithms, PID, state machines	MCU (STM32), FPGA
Perception inference	10-100ms (10-100Hz)	Neural network inference, SLAM	Jetson, edge AI
High-level planning	100ms-1s	Motion planning, task planning	Jetson AGX, industrial PC
Model training	Hours to days	Large-scale RL/VLA training	GPU cluster

NVIDIA Jetson Series

Jetson is the de facto standard for robot AI computing. The current main product line is the Orin series.

Orin Series Comparison

Model	GPU	CPU	AI Performance	Memory	Storage Interface	Power	Ref. Price
Orin Nano 4GB	512 CUDA	6-core A78AE	20 TOPS	4GB LPDDR5	NVMe	7-15W	~$199
Orin Nano 8GB	1024 CUDA	6-core A78AE	40 TOPS	8GB LPDDR5	NVMe	7-15W	~$299
Orin NX 8GB	1024 CUDA	6-core A78AE	70 TOPS	8GB LPDDR5	NVMe	10-25W	~$399
Orin NX 16GB	1024 CUDA	8-core A78AE	100 TOPS	16GB LPDDR5	NVMe	10-25W	~$599
AGX Orin 32GB	1792 CUDA	8-core A78AE + 4-core A78	200 TOPS	32GB LPDDR5	NVMe	15-50W	~$999
AGX Orin 64GB	2048 CUDA	12-core A78AE	275 TOPS	64GB LPDDR5	NVMe	15-60W	~$1,599

Jetson Nano (Legacy)

The original Jetson Nano (128 CUDA Maxwell, 472 GFLOPS FP16, 4GB LPDDR4, 5-10W, ~$149) is gradually being replaced by the Orin Nano, but remains widely used in educational settings.

JetPack SDK

JetPack is the complete SDK for Jetson, including:

Component	Description
L4T	Linux for Tegra (Ubuntu-based)
CUDA	GPU computing
cuDNN	Deep learning acceleration
TensorRT	Inference optimization engine (FP16/INT8 quantization, layer fusion)
VPI	Vision Programming Interface
Multimedia API	Hardware codec
DeepStream	Video analytics pipeline
Isaac ROS	Robot-specific ROS 2 acceleration packages

Version Mapping

JetPack	L4T	CUDA	Supported Hardware
5.1.x	R35.x	11.4	All Orin series
6.0+	R36.x	12.2+	All Orin series

Deployment Optimization

# Set high-performance mode
sudo nvpmodel -m 0    # MAXN mode (maximum performance)
sudo jetson_clocks     # Lock maximum frequency

# TensorRT model optimization
trtexec --onnx=model.onnx \
        --saveEngine=model.engine \
        --fp16 \                    # FP16 quantization
        --workspace=4096             # 4GB workspace

# Python TensorRT inference example
import tensorrt as trt
import pycuda.driver as cuda

# Load engine
runtime = trt.Runtime(trt.Logger())
with open("model.engine", "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

context = engine.create_execution_context()
# Bind inputs/outputs, allocate memory, execute inference
# Typical latency: ~5ms on Orin NX (INT8), ~20ms on Orin Nano (FP16)

Jetson Selection Recommendations

Application	Recommended Model	Rationale
Education/entry-level	Orin Nano 8GB	Low cost, sufficient
Service robot (navigation+avoidance)	Orin NX 16GB	Balanced performance and power
Robot arm manipulation (VLA inference)	AGX Orin 32GB	Large models require large memory
Humanoid robot	AGX Orin 64GB	Multi-modal perception + whole-body control
Autonomous driving prototype	AGX Orin 64GB	Multi-sensor fusion

Next Generation: Jetson Thor

NVIDIA's announced next-generation robot computing platform based on the Blackwell GPU architecture:

Metric	AGX Orin	Thor (Expected)
AI Performance	275 TOPS	800+ TOPS
Memory	64GB	128GB
GPU Architecture	Ampere	Blackwell
Target Application	General robotics	Humanoid robot foundation models

Other Edge AI Platforms

Intel Movidius (Integrated into OpenVINO)

Feature	Description
Chip	Myriad X VPU
Performance	~4 TOPS
Power	~1W
Features	Ultra-low power, USB accelerator stick form factor
SDK	Intel OpenVINO
Status	Standalone chips no longer produced, integrated into Intel platforms

Google Coral TPU

Feature	Description
Chip	Edge TPU
Performance	4 TOPS (INT8)
Power	~2W
Form Factor	USB accelerator / Dev Board / M.2 module
SDK	TensorFlow Lite
Features	INT8-dedicated, extremely low inference latency

Comparison

Platform	Performance	Power	Ecosystem	Flexibility	Price
Jetson Orin Nano	40 TOPS	15W	CUDA/TensorRT	Very high	$299
Coral TPU	4 TOPS	2W	TF Lite	Low	$60
OpenVINO (Intel)	~5 TOPS	5W	OpenVINO	Medium	$80
Hailo-8	26 TOPS	3W	Hailo SDK	Medium	~$100
Rockchip RK3588	6 TOPS	5-10W	RKNN	Medium	~$100

Onboard vs Cloud Computing

In robot systems, different tasks have different latency requirements, necessitating proper partitioning of local and cloud computation.

Latency Requirements and Compute Location

Control Tier	Frequency Req.	Latency Tolerance	Compute Location	Example
Low-level motor control	1-10 kHz	<1 ms	Local (FPGA/MCU)	PID torque control
Mid-level motion control	100-500 Hz	2-10 ms	Local (Jetson/MCU)	Trajectory tracking
High-level policy inference	10-50 Hz	20-100 ms	Local (Jetson)	Visual policy inference
Language understanding/planning	0.1-1 Hz	100ms-seconds	Cloud/local both viable	VLM task planning
Training/fine-tuning	Offline	Unconstrained	Cloud	Policy model training

Key Principle: 1kHz-level control loops must run locally and never depend on network; 10Hz-level high-level planning can consider cloud assistance but needs a local fallback mechanism.

Cloud / Workstation GPUs

Training Card Comparison

GPU	VRAM	FP16 TFLOPS	Interconnect	Price	Typical Use
RTX 4090	24GB	330	PCIe	~$1,600	Personal research
A100 80GB	80GB	312	NVLink	~$15,000	Lab training
H100 80GB	80GB	990	NVLink/NVSwitch	~$30,000	Large-scale training
H200	141GB HBM3e	990	NVLink	~$35,000	VLA large models

GPU Requirements for Robot AI

Task	Model Scale	Minimum GPU	Recommended GPU
Small RL policy training	<10M params	RTX 3060	RTX 4090
Isaac Lab parallel training	—	RTX 3080	A100
VLA fine-tuning (7B)	7B params	A100 40GB	2x A100 80GB
VLA pretraining	7B+ params	8x A100	8x H100
Real-time VLA inference	3B params	Jetson AGX Orin	—
Real-time small model inference	<100M params	Jetson Orin NX	—

Inference Optimization Pipeline

Common optimization techniques for deploying models at the edge:

Quantization: FP32 -> FP16 -> INT8, reducing computational requirements 2-8x
Distillation: Transfer large model knowledge to smaller models
Pruning: Remove redundant weights
TensorRT optimization: Layer fusion, memory optimization, 2-5x inference speedup

Training (A100/H100, FP32)
    | Export ONNX
    | TensorRT conversion (FP16/INT8)
    | Deploy to Jetson
Inference (Jetson Orin, INT8): 50ms -> 8ms, 400MB -> 100MB

FPGA in Robotics

FPGAs are used for real-time control scenarios requiring microsecond-level deterministic latency.

Typical Applications

Application	Description	Latency Requirement
Motor FOC control	Field-oriented control, PWM generation	<10us
EtherCAT master	Real-time industrial communication	<1ms
Sensor preprocessing	Encoder counting, ADC sampling	<1us
Safety monitoring	Force/position limits, emergency stop	<10us

Common FPGA Platforms

Platform	Chip	Features	Price	Application
Xilinx Zynq-7000	ARM + FPGA	SoC, embedded + logic	~$200	Motor control
Intel Cyclone V	ARM + FPGA	Low-cost SoC	~$150	Education/prototype
Xilinx Kria KV260	Zynq UltraScale+	Vision AI + real-time control	~$250	Robot vision
Lattice iCE40	—	Ultra-low power, open-source toolchain	~$50	Simple control logic

FPGA vs MCU Comparison

Feature	FPGA	MCU (STM32, etc.)
Latency	<1 us	1-100 us
Parallelism	True hardware parallelism	Pseudo-parallelism (interrupts)
Development difficulty	High (HDL/Verilog)	Low (C/C++)
Flexibility	Hardware reconfigurable	Fixed architecture
Cost	Higher	Low
Typical scenario	Multi-axis synchronous control	Single-axis PID control

Power Budget for Mobile Robots

Mobile robots have limited battery capacity; computing platform power directly affects battery life.

Typical Power Distribution (Mobile Manipulation Robot)

Subsystem	Power Share	Typical Power
Mobile base motors	40-50%	50-200 W
Robot arm motors	20-30%	20-100 W
Computing platform	10-20%	10-60 W
Sensors	5-10%	5-20 W
Communication	2-5%	2-10 W

Power Optimization Strategies

Dynamic frequency scaling: Lower GPU/CPU frequency when idle (nvpmodel to switch power modes)
On-demand model loading: Switch to lightweight models when complex inference is unnecessary
Sensor sleep: Non-essential sensors can sample intermittently
Mixed-precision inference: Use lower precision for non-critical tasks (INT8 reduces power ~40% vs FP16)

Heterogeneous Computing Architecture

Real robot systems typically employ heterogeneous computing architectures with multi-tier hardware collaboration:

Tier	Hardware	Communication	Function
Level 0	MCU (STM32H7)	CAN/SPI	Motor control (10kHz)
Level 1	FPGA (optional)	EtherCAT	Real-time safety monitoring
Level 2	Jetson (Orin)	Ethernet/USB3	AI inference + ROS2
Level 3	Cloud GPU	WiFi/5G	Training, remote monitoring

+------------------------------------------+
|              Cloud (GPU Cluster)          |
|         Train VLA / Large-scale sim      |
+-----------------+------------------------+
                  | WiFi / 5G
+-----------------+------------------------+
|          Jetson AGX Orin (ROS2)          |
|    Perception | SLAM | Planning | VLA    |
+----+----------+----------+---------------+
     | USB3     | Ethernet | EtherCAT
+----+----+ +---+---+ +---+--------------+
| Camera  | | LiDAR | | MCU (STM32)      |
| D435i   | | Mid360| | Motor FOC 10kHz  |
+---------+ +-------+ | Encoder readout  |
                      | Safety limits    |
                      +------------------+

Selection Decision Process

Determine inference model size: Parameter count determines minimum memory requirements
Determine inference frequency: Control >100Hz requires high compute; perception at 30Hz is less demanding
Power budget: Mobile robots are strictly constrained; fixed installations are more relaxed
ROS2 support: Jetson ecosystem is most comprehensive
Cost constraints: Orin Nano for education, AGX Orin for research

For a more detailed selection framework, see Hardware Selection Guide.

NVIDIA Jetson
Google Coral
Intel OpenVINO
Related notes: NVIDIA Ecosystem | Real-time Systems | Hardware Selection Guide