V0.2 — Replacing mock skills with a truly autonomous Stretch
v0.1 proved the ANIMA cognition stack could close the loop under language input. v0.2 replaces the mock skills underneath with genuinely autonomous execution — every run queries object poses from the MJCF, solves IK, and steps physics. No script replay.
A pivot happened during implementation, worth recording.
Pivot: from ROS 2 + Gazebo to pure-Python MuJoCo
Original plan: Docker + ROS 2 Humble + Gazebo Harmonic + MoveIt 2 + Nav2 — the canonical "heavy" robot stack.
Three problems surfaced during execution:
hello-robotdoes not ship astretch_moveit2apt package or GitHub config. We would have had to write the SRDF + kinematics config ourselves (3–5 hours of work)- The
stretch_mujocoPython API was already sufficient:sim.pull_camera_data()→ RGB frames;sim.move_to(Actuators.lift, pos)+wait_until_at_setpoint→ joint position control;sim.set_base_velocity(v, ω)→ base velocity. The Stretch arm only has 5 DoF (lift Z + arm extension + wrist yaw/pitch/roll) — analytical IK fits in 10 lines - Nav2 is visibly overkill for a closed single-room + hallway scene
New plan: pure-Python stretch_mujoco API + analytical IK + unicycle PID. ANIMA's L3 skill calls stretch_mujoco directly, no ROS nodes.
Side benefits that made this the right call:
- MuJoCo supports macOS natively, so the laptop can run the sim without VSCode Remote SSH
- Deployment collapses to three processes (FastAPI + Next.js + MuJoCo); no rosdep / colcon / DDS tuning
- The real-hardware path isn't closed: v1.x can swap
SimSkillBehaviourforRos2SkillBehaviourwhile leaving thepy_treesorchestration untouched
The ROS 2 Humble + stretch_ros2 workspace stays around as a reference for future hardware work.
Six L3 skill primitives wired up
A new SimSkillBehaviour base class plus six concrete skills, all sharing one blackboard dict:
locate— query the target body's pose in simulationnavigate— unicycle PID with align / approach hysteresis to prevent oscillationgrasp— analytical IK + close gripperlift— raise to a safe heightdeliver— drive to the bedside poserelease— open gripper
L2 Planner also gains a "fall back to v0.1 mock skills if simulation unavailable" branch — so local zero-config still runs end to end.
Simulation view lands in the frontend
Backend sim/manager.py holds the StretchMujocoSimulator and renders a demo_view third-person camera itself. The stretch_mujoco camera pipeline doesn't expose custom MJCF cameras, so this path bypasses it via mujoco.Renderer directly.
Three routes:
/api/sim/mjpeg— multipart MJPEG stream/api/sim/reset— reset scene to initial state/api/sim/status— health probe
Frontend adds a SimulationView.tsx component (an <img> fed by the MJPEG stream plus a Reset button). Layout expands from three columns to four (nav / sim live / intent+BT / factors).
L5 starts being data-driven
In l5_assessment.py, p_skill moves from a fixed 0.91 constant to a rolling success-rate read from pea_log.jsonl. PEA finally starts feeding back into GOA instead of being a static number.
Deployment
Hetzner (89.167.35.145), three systemd units + nginx reverse proxy: / → Next.js prod, /api → FastAPI, /ws → FastAPI WebSocket upgrade. Nginx buffering is disabled for /api/sim/ so MJPEG streams in real time.
Acceptance (end-to-end script)
- Open
http://89.167.35.145, see the ward scene (bed, nightstand, water cup, Stretch at starting pose) - Type "I want water" → L0 through L5 light up sequentially → TaskSpec emits
DRINK_WATER→ behavior-tree six nodes turn green - MJPEG shows Stretch autonomously navigating, arming toward the cup, closing the gripper, lifting, returning to bedside — each run recomputes IK, not replay
- Five factors update as the task progresses; PEA logs a new
outcome=successrow - The Reset Sim button returns the scene to initial state within 2 seconds
Deferred to later versions
- Six-scene branching logic (CALL_HELP / ADJUST_TV / …) → v0.3
- Failure-fallback narratives → v0.3
- Videos for non–DRINK_WATER scenes → v0.4