Skip to content

Debugging and Testing

Overview

Debugging is the most time-consuming phase of robot development -- coding accounts for 20%, debugging for 80%. Good debugging methods and tools can dramatically improve efficiency. This section covers hardware debugging, software debugging, and system-level testing.


Hardware Debugging Tools

Multimeter

The most basic and most important debugging tool.

Essential Checks:

Measurement Method Expected Value
Supply voltage DC voltage mode Within ±5% of design value
Ground continuity Continuity mode (buzzer) All GND connected
Current draw Series current mode Does not exceed rated value
Short circuit check Resistance mode VCC-GND should not be shorted
Signal level DC voltage mode High/Low levels correct

Debugging Steps (must do before power-on):

  1. Check VCC-GND for shorts
  2. Confirm all connectors are properly seated
  3. Measure all voltage rails immediately after power-on
  4. Verify current draw is within expected range

Oscilloscope

Essential tool for observing time-varying signals.

Use Cases:

Signal What to Observe Common Problems
PWM Frequency, duty cycle, rising edge Wrong frequency, excessive jitter
SPI Clock, data, chip select timing CPOL/CPHA misconfiguration
I2C SCL/SDA waveforms Improper pull-up resistors
UART Baud rate, data frames Baud rate mismatch
CAN Differential signal, bit timing Missing termination resistor
Power Ripple, transient response Excessive ripple, voltage drops

Recommended Equipment:

Device Bandwidth Price Suitable For
Rigol DS1054Z 50 MHz ~$400 Entry-level (hackable to 100MHz)
Siglent SDS1104 100 MHz ~$400 Good value
Keysight DSOX1204G 70 MHz ~$500 Brand reliability
Analog Discovery 2 30 MHz ~$300 Portable + logic analyzer

Logic Analyzer

Powerful tool for observing digital communication protocols.

Saleae Logic Series (recommended):

  • Logic 8: 8 channels, 25 MS/s, ~$400
  • Logic Pro 16: 16 channels, 500 MS/s, ~$1000
  • Software auto-decodes: SPI, I2C, UART, CAN, 1-Wire, etc.

Use Case:

Problem: STM32 SPI communication with IMU fails
Troubleshooting:
1. Connect logic analyzer to SCK, MOSI, MISO, CS lines
2. Software auto-decodes SPI frames
3. Found: CS signal was inverted (Active High instead of Active Low)
4. Fix: Configure SPI CS polarity

Budget Alternatives:

  • Sigrok + fx2lafw compatible analyzers (~$10)
  • Raspberry Pi Pico repurposed as logic analyzer (~$5)

Serial Terminal

Basic communication debugging with MCUs/sensors:

Tool Platform Features
minicom Linux Lightweight CLI
PuTTY Windows Classic GUI
screen Linux/Mac Simplest
CoolTerm Cross-platform Feature-rich
Arduino Serial Monitor Cross-platform Beginner-friendly
# Linux serial connection
minicom -D /dev/ttyUSB0 -b 115200

# Quick serial data view
screen /dev/ttyUSB0 115200

ROS2 Debugging Tools

Command-Line Tools

# List all active topics
ros2 topic list

# View topic data (real-time)
ros2 topic echo /imu/data

# Check topic frequency
ros2 topic hz /camera/image_raw

# Check topic bandwidth
ros2 topic bw /camera/image_raw

# View topic message type
ros2 topic info /cmd_vel

# List all nodes
ros2 node list

# View node info (publishers/subscribers/services)
ros2 node info /controller_node

# Publish test message
ros2 topic pub /cmd_vel geometry_msgs/msg/Twist \
    "{linear: {x: 0.5}, angular: {z: 0.3}}"

# Call service
ros2 service call /set_mode std_srvs/srv/SetBool "{data: true}"

# View tf tree
ros2 run tf2_tools view_frames

rqt Tool Suite

Tool Command Function
rqt_graph rqt_graph Node-topic connection graph
rqt_plot rqt_plot Real-time data plotting
rqt_image_view rqt_image_view Image topic viewer
rqt_console rqt_console Log viewer/filter
rqt_reconfigure rqt_reconfigure Dynamic parameter tuning

rviz2

3D visualization is the core tool for debugging robot perception and motion:

Common Display Types:

Display Type Topic Use
RobotModel /robot_description Robot model
TF /tf Coordinate frame visualization
PointCloud2 /lidar/points Point cloud
Image /camera/image_raw Image
LaserScan /scan 2D laser scan
Marker /visualization_marker Custom markers
Path /planned_path Planned path
Wrench /ft_sensor/wrench Force/Torque vectors

rosbag2 Data Recording

# Record all topics
ros2 bag record -a

# Record specific topics
ros2 bag record /imu/data /camera/image_raw /joint_states

# Playback
ros2 bag play my_recording/

# View recording info
ros2 bag info my_recording/

Debugging workflow: Record when problems occur -> Replay later for analysis -> No need to reproduce the problem.

Foxglove Studio

Modern web-based visualization tool, replacing rviz2 + rqt:

  • Supports ROS2 live connection and rosbag playback
  • Rich panels: 3D, charts, images, logs
  • Remote access (web browser)
  • Shareable visualization configurations

System-Level Testing

Testing Pyramid

        ╱╲
       ╱  ╲
      ╱ Field ╲        ← Few, high cost
     ╱  Tests  ╲
    ╱──────────╲
   ╱ Integration ╲     ← Medium, medium cost
  ╱    Tests      ╲
 ╱──────────────────╲
╱    Unit Tests      ╲  ← Many, low cost
╱──────────────────────╲

Unit Testing

Test each node/module independently:

# test_controller.py
import pytest
from my_robot.controller import PIDController

def test_pid_output():
    pid = PIDController(kp=1.0, ki=0.1, kd=0.01)

    # Set target
    pid.set_target(10.0)

    # Current value below target, output should be positive
    output = pid.compute(current=5.0, dt=0.01)
    assert output > 0

    # At target, output should be near zero
    for _ in range(1000):
        output = pid.compute(current=10.0, dt=0.01)
    assert abs(output) < 0.1

def test_pid_limits():
    pid = PIDController(kp=100.0, ki=0, kd=0)
    pid.set_target(100.0)
    pid.set_output_limits(-50, 50)

    output = pid.compute(current=0.0, dt=0.01)
    assert output <= 50  # Should be clamped

ROS2 integration test framework:

# test_camera_node.py (launch_testing)
import launch_testing
import rclpy
from sensor_msgs.msg import Image

def test_camera_publishes(self):
    """Verify camera node publishes images within 5 seconds"""
    received = []

    def callback(msg):
        received.append(msg)

    node.create_subscription(Image, '/camera/image_raw', callback, 10)

    # Wait up to 5 seconds
    end_time = time.time() + 5.0
    while time.time() < end_time and len(received) == 0:
        rclpy.spin_once(node, timeout_sec=0.1)

    assert len(received) > 0, "No image received within 5 seconds"
    assert received[0].width == 640
    assert received[0].height == 480

Integration Testing

Verify interactions between subsystems:

Test Item Verification Content
Perception -> Control Sensor data correctly passed to controller
Control -> Execution Control commands correctly drive motors
End-to-end latency Latency from perception to execution
Fault propagation One node crash does not affect others
Resource consumption CPU/Memory/Bandwidth within acceptable range

Field Testing

Testing in the actual environment:

=== Field Test Checklist ===

Before startup:
[ ] Battery fully charged
[ ] All connectors secure
[ ] Emergency stop button functional
[ ] Remote controller connected
[ ] Test area clear and safe

Basic functions:
[ ] Stationary startup, no abnormal vibration
[ ] Low-speed forward/backward/turning
[ ] Sensor data streams normal
[ ] Emergency stop functional

Performance testing:
[ ] Maximum speed test
[ ] Slope/Obstacle test
[ ] Continuous operation for 30 minutes
[ ] Battery consumption log

Fault testing:
[ ] Remote controller disconnect -> safe stop
[ ] Collision -> detection and stop
[ ] Low battery -> warning and shutdown

Common Failure Modes and Diagnosis

Hardware Failures

Symptom Possible Cause Diagnostic Method
Device not powering on Fuse blown/Poor contact Multimeter voltage check
Intermittent disconnection Loose connector/Cold solder joint Wiggle cable and observe
Motor jitter Encoder noise/Grounding issue Oscilloscope on encoder signal
Abnormal sensor readings EMI interference Compare with motors off
Thermal shutdown Inadequate cooling Thermal imaging/Temperature sensor
Communication errors Baud rate/Termination resistor Logic analyzer packet capture

Software Failures

Symptom Possible Cause Diagnostic Method
Node crash Segfault/Unhandled exception GDB/core dump
Increasing data latency Processing bottleneck/Queue overflow top/htop for CPU
Topics not communicating QoS mismatch/DDS configuration ros2 doctor
tf errors Missing transform/Timestamp issues ros2 run tf2_tools view_frames
Memory leak Unreleased buffers Valgrind / AddressSanitizer
Data desynchronization Inconsistent timestamps Check header.stamp

System-Level Failures

Symptom Possible Cause Diagnostic Method
Periodic performance drops Thermal throttling sensors / Temperature logs
Non-deterministic behavior Race conditions/Timing issues Add logs + timestamp analysis
Occasional crashes Power transients/EMI Power ripple measurement
Localization drift IMU integration error/Calibration issues Compare with ground truth system

Debugging Methodology

Binary Search Debugging

When the system does not work, do not blindly try things -- use binary search to narrow down the problem:

Full system not working
├── Hardware or Software? → Verify hardware with simplest test program
│   ├── Hardware issue
│   │   ├── Power or Signal? → Multimeter voltage check
│   │   └── Which module? → Disconnect one by one
│   └── Software issue
│       ├── Which node? → Start one by one
│       └── Which function? → Comment out code, binary search

Minimal Reproduction

When a bug is found:

  1. Record the phenomenon: Screenshots/Screen recording/rosbag
  2. Simplify the environment: Remove unrelated modules
  3. Minimal reproduction: Trigger the problem with fewest steps
  4. Fix and verify: After fixing, verify with the same steps

Logging Best Practices

import rclpy
from rclpy.node import Node

class MyNode(Node):
    def __init__(self):
        super().__init__('my_node')

    def process(self):
        self.get_logger().debug('Detailed internal state: ...')  # Development
        self.get_logger().info('Normal operation info')           # Routine
        self.get_logger().warn('Approaching limit: temp=85°C')   # Attention needed
        self.get_logger().error('Sensor read failed')             # Error
        self.get_logger().fatal('Unrecoverable error')            # Critical
# Set log level at runtime
ros2 run my_pkg my_node --ros-args --log-level debug

Performance Analysis

CPU Performance

# Real-time monitoring
htop

# ROS2 node CPU usage
ros2 run rqt_top rqt_top

# Code-level profiling (C++)
perf record ros2 run my_pkg my_node
perf report

# Python profiling
python3 -m cProfile -o profile.out my_script.py

Communication Latency

# Measure topic end-to-end latency
def callback(msg):
    now = self.get_clock().now()
    msg_time = rclpy.time.Time.from_msg(msg.header.stamp)
    latency = (now - msg_time).nanoseconds / 1e6  # ms
    self.get_logger().info(f'Latency: {latency:.1f} ms')

Bandwidth Monitoring

# Network bandwidth
iftop

# ROS2 topic bandwidth
ros2 topic bw /camera/image_raw
# Output: average: 18.5 MB/s

References


评论 #