Debugging and Testing
Overview
Debugging is the most time-consuming phase of robot development -- coding accounts for 20%, debugging for 80%. Good debugging methods and tools can dramatically improve efficiency. This section covers hardware debugging, software debugging, and system-level testing.
Hardware Debugging Tools
Multimeter
The most basic and most important debugging tool.
Essential Checks:
| Measurement | Method | Expected Value |
|---|---|---|
| Supply voltage | DC voltage mode | Within ±5% of design value |
| Ground continuity | Continuity mode (buzzer) | All GND connected |
| Current draw | Series current mode | Does not exceed rated value |
| Short circuit check | Resistance mode | VCC-GND should not be shorted |
| Signal level | DC voltage mode | High/Low levels correct |
Debugging Steps (must do before power-on):
- Check VCC-GND for shorts
- Confirm all connectors are properly seated
- Measure all voltage rails immediately after power-on
- Verify current draw is within expected range
Oscilloscope
Essential tool for observing time-varying signals.
Use Cases:
| Signal | What to Observe | Common Problems |
|---|---|---|
| PWM | Frequency, duty cycle, rising edge | Wrong frequency, excessive jitter |
| SPI | Clock, data, chip select timing | CPOL/CPHA misconfiguration |
| I2C | SCL/SDA waveforms | Improper pull-up resistors |
| UART | Baud rate, data frames | Baud rate mismatch |
| CAN | Differential signal, bit timing | Missing termination resistor |
| Power | Ripple, transient response | Excessive ripple, voltage drops |
Recommended Equipment:
| Device | Bandwidth | Price | Suitable For |
|---|---|---|---|
| Rigol DS1054Z | 50 MHz | ~$400 | Entry-level (hackable to 100MHz) |
| Siglent SDS1104 | 100 MHz | ~$400 | Good value |
| Keysight DSOX1204G | 70 MHz | ~$500 | Brand reliability |
| Analog Discovery 2 | 30 MHz | ~$300 | Portable + logic analyzer |
Logic Analyzer
Powerful tool for observing digital communication protocols.
Saleae Logic Series (recommended):
- Logic 8: 8 channels, 25 MS/s, ~$400
- Logic Pro 16: 16 channels, 500 MS/s, ~$1000
- Software auto-decodes: SPI, I2C, UART, CAN, 1-Wire, etc.
Use Case:
Problem: STM32 SPI communication with IMU fails
Troubleshooting:
1. Connect logic analyzer to SCK, MOSI, MISO, CS lines
2. Software auto-decodes SPI frames
3. Found: CS signal was inverted (Active High instead of Active Low)
4. Fix: Configure SPI CS polarity
Budget Alternatives:
- Sigrok + fx2lafw compatible analyzers (~$10)
- Raspberry Pi Pico repurposed as logic analyzer (~$5)
Serial Terminal
Basic communication debugging with MCUs/sensors:
| Tool | Platform | Features |
|---|---|---|
| minicom | Linux | Lightweight CLI |
| PuTTY | Windows | Classic GUI |
| screen | Linux/Mac | Simplest |
| CoolTerm | Cross-platform | Feature-rich |
| Arduino Serial Monitor | Cross-platform | Beginner-friendly |
# Linux serial connection
minicom -D /dev/ttyUSB0 -b 115200
# Quick serial data view
screen /dev/ttyUSB0 115200
ROS2 Debugging Tools
Command-Line Tools
# List all active topics
ros2 topic list
# View topic data (real-time)
ros2 topic echo /imu/data
# Check topic frequency
ros2 topic hz /camera/image_raw
# Check topic bandwidth
ros2 topic bw /camera/image_raw
# View topic message type
ros2 topic info /cmd_vel
# List all nodes
ros2 node list
# View node info (publishers/subscribers/services)
ros2 node info /controller_node
# Publish test message
ros2 topic pub /cmd_vel geometry_msgs/msg/Twist \
"{linear: {x: 0.5}, angular: {z: 0.3}}"
# Call service
ros2 service call /set_mode std_srvs/srv/SetBool "{data: true}"
# View tf tree
ros2 run tf2_tools view_frames
rqt Tool Suite
| Tool | Command | Function |
|---|---|---|
| rqt_graph | rqt_graph |
Node-topic connection graph |
| rqt_plot | rqt_plot |
Real-time data plotting |
| rqt_image_view | rqt_image_view |
Image topic viewer |
| rqt_console | rqt_console |
Log viewer/filter |
| rqt_reconfigure | rqt_reconfigure |
Dynamic parameter tuning |
rviz2
3D visualization is the core tool for debugging robot perception and motion:
Common Display Types:
| Display Type | Topic | Use |
|---|---|---|
| RobotModel | /robot_description |
Robot model |
| TF | /tf |
Coordinate frame visualization |
| PointCloud2 | /lidar/points |
Point cloud |
| Image | /camera/image_raw |
Image |
| LaserScan | /scan |
2D laser scan |
| Marker | /visualization_marker |
Custom markers |
| Path | /planned_path |
Planned path |
| Wrench | /ft_sensor/wrench |
Force/Torque vectors |
rosbag2 Data Recording
# Record all topics
ros2 bag record -a
# Record specific topics
ros2 bag record /imu/data /camera/image_raw /joint_states
# Playback
ros2 bag play my_recording/
# View recording info
ros2 bag info my_recording/
Debugging workflow: Record when problems occur -> Replay later for analysis -> No need to reproduce the problem.
Foxglove Studio
Modern web-based visualization tool, replacing rviz2 + rqt:
- Supports ROS2 live connection and rosbag playback
- Rich panels: 3D, charts, images, logs
- Remote access (web browser)
- Shareable visualization configurations
System-Level Testing
Testing Pyramid
╱╲
╱ ╲
╱ Field ╲ ← Few, high cost
╱ Tests ╲
╱──────────╲
╱ Integration ╲ ← Medium, medium cost
╱ Tests ╲
╱──────────────────╲
╱ Unit Tests ╲ ← Many, low cost
╱──────────────────────╲
Unit Testing
Test each node/module independently:
# test_controller.py
import pytest
from my_robot.controller import PIDController
def test_pid_output():
pid = PIDController(kp=1.0, ki=0.1, kd=0.01)
# Set target
pid.set_target(10.0)
# Current value below target, output should be positive
output = pid.compute(current=5.0, dt=0.01)
assert output > 0
# At target, output should be near zero
for _ in range(1000):
output = pid.compute(current=10.0, dt=0.01)
assert abs(output) < 0.1
def test_pid_limits():
pid = PIDController(kp=100.0, ki=0, kd=0)
pid.set_target(100.0)
pid.set_output_limits(-50, 50)
output = pid.compute(current=0.0, dt=0.01)
assert output <= 50 # Should be clamped
ROS2 integration test framework:
# test_camera_node.py (launch_testing)
import launch_testing
import rclpy
from sensor_msgs.msg import Image
def test_camera_publishes(self):
"""Verify camera node publishes images within 5 seconds"""
received = []
def callback(msg):
received.append(msg)
node.create_subscription(Image, '/camera/image_raw', callback, 10)
# Wait up to 5 seconds
end_time = time.time() + 5.0
while time.time() < end_time and len(received) == 0:
rclpy.spin_once(node, timeout_sec=0.1)
assert len(received) > 0, "No image received within 5 seconds"
assert received[0].width == 640
assert received[0].height == 480
Integration Testing
Verify interactions between subsystems:
| Test Item | Verification Content |
|---|---|
| Perception -> Control | Sensor data correctly passed to controller |
| Control -> Execution | Control commands correctly drive motors |
| End-to-end latency | Latency from perception to execution |
| Fault propagation | One node crash does not affect others |
| Resource consumption | CPU/Memory/Bandwidth within acceptable range |
Field Testing
Testing in the actual environment:
=== Field Test Checklist ===
Before startup:
[ ] Battery fully charged
[ ] All connectors secure
[ ] Emergency stop button functional
[ ] Remote controller connected
[ ] Test area clear and safe
Basic functions:
[ ] Stationary startup, no abnormal vibration
[ ] Low-speed forward/backward/turning
[ ] Sensor data streams normal
[ ] Emergency stop functional
Performance testing:
[ ] Maximum speed test
[ ] Slope/Obstacle test
[ ] Continuous operation for 30 minutes
[ ] Battery consumption log
Fault testing:
[ ] Remote controller disconnect -> safe stop
[ ] Collision -> detection and stop
[ ] Low battery -> warning and shutdown
Common Failure Modes and Diagnosis
Hardware Failures
| Symptom | Possible Cause | Diagnostic Method |
|---|---|---|
| Device not powering on | Fuse blown/Poor contact | Multimeter voltage check |
| Intermittent disconnection | Loose connector/Cold solder joint | Wiggle cable and observe |
| Motor jitter | Encoder noise/Grounding issue | Oscilloscope on encoder signal |
| Abnormal sensor readings | EMI interference | Compare with motors off |
| Thermal shutdown | Inadequate cooling | Thermal imaging/Temperature sensor |
| Communication errors | Baud rate/Termination resistor | Logic analyzer packet capture |
Software Failures
| Symptom | Possible Cause | Diagnostic Method |
|---|---|---|
| Node crash | Segfault/Unhandled exception | GDB/core dump |
| Increasing data latency | Processing bottleneck/Queue overflow | top/htop for CPU |
| Topics not communicating | QoS mismatch/DDS configuration | ros2 doctor |
| tf errors | Missing transform/Timestamp issues | ros2 run tf2_tools view_frames |
| Memory leak | Unreleased buffers | Valgrind / AddressSanitizer |
| Data desynchronization | Inconsistent timestamps | Check header.stamp |
System-Level Failures
| Symptom | Possible Cause | Diagnostic Method |
|---|---|---|
| Periodic performance drops | Thermal throttling | sensors / Temperature logs |
| Non-deterministic behavior | Race conditions/Timing issues | Add logs + timestamp analysis |
| Occasional crashes | Power transients/EMI | Power ripple measurement |
| Localization drift | IMU integration error/Calibration issues | Compare with ground truth system |
Debugging Methodology
Binary Search Debugging
When the system does not work, do not blindly try things -- use binary search to narrow down the problem:
Full system not working
├── Hardware or Software? → Verify hardware with simplest test program
│ ├── Hardware issue
│ │ ├── Power or Signal? → Multimeter voltage check
│ │ └── Which module? → Disconnect one by one
│ └── Software issue
│ ├── Which node? → Start one by one
│ └── Which function? → Comment out code, binary search
Minimal Reproduction
When a bug is found:
- Record the phenomenon: Screenshots/Screen recording/rosbag
- Simplify the environment: Remove unrelated modules
- Minimal reproduction: Trigger the problem with fewest steps
- Fix and verify: After fixing, verify with the same steps
Logging Best Practices
import rclpy
from rclpy.node import Node
class MyNode(Node):
def __init__(self):
super().__init__('my_node')
def process(self):
self.get_logger().debug('Detailed internal state: ...') # Development
self.get_logger().info('Normal operation info') # Routine
self.get_logger().warn('Approaching limit: temp=85°C') # Attention needed
self.get_logger().error('Sensor read failed') # Error
self.get_logger().fatal('Unrecoverable error') # Critical
# Set log level at runtime
ros2 run my_pkg my_node --ros-args --log-level debug
Performance Analysis
CPU Performance
# Real-time monitoring
htop
# ROS2 node CPU usage
ros2 run rqt_top rqt_top
# Code-level profiling (C++)
perf record ros2 run my_pkg my_node
perf report
# Python profiling
python3 -m cProfile -o profile.out my_script.py
Communication Latency
# Measure topic end-to-end latency
def callback(msg):
now = self.get_clock().now()
msg_time = rclpy.time.Time.from_msg(msg.header.stamp)
latency = (now - msg_time).nanoseconds / 1e6 # ms
self.get_logger().info(f'Latency: {latency:.1f} ms')
Bandwidth Monitoring
# Network bandwidth
iftop
# ROS2 topic bandwidth
ros2 topic bw /camera/image_raw
# Output: average: 18.5 MB/s
References
- ROS2 Debugging Guide: docs.ros.org/en/humble/Tutorials/
- Saleae Logic Analyzer: saleae.com
- Foxglove Studio: foxglove.dev
- Rigol DS1054Z: EEVblog Review