V0.4 — Telling the Intent-to-Care Loop Clearly, in Simulation
V0.4 is the first milestone for SOMA Care as a standalone product line. Before this, v0.1–v0.3 lived in a separate research-prototype repo: a FastAPI + Next.js scaffold, the six-intent loop, a MuJoCo Stretch simulation, L0–L5 UI panels, and multi-camera MJPEG streams — all end-to-end runnable, but at "prototype works" quality. The visuals were coarse, the intent library was small, failure narratives were incomplete, and the material wasn't presentation-ready.
V0.4 isn't about adding new features. It's about making the existing loop clearer, finer, and richer, and then delivering it primarily as offline video. Four focus areas below.
1. Clearer simulation
The old stack rendered on Hetzner cloud without GPUs, topping out around 0.4 fps — just enough to stream MJPEG. V0.4 moves to local rendering on Mac Metal, targeting 30–60 fps during recording so video delivery has real headroom.
Concrete work:
- Integrate PBR assets from
Hospital_assets(walls, floor, trim, medical-equipment textures) - Three-tier lighting: directional daylight through the window + fluorescent ceiling panels + bedside lamp
- New
first_person_nursecamera (eye height ~1.65 m, slight head bob when walking) - Stretch RE3 model refinements: textured gripper pads, panel labels, finer hand mesh
2. Finer operations
The six base intents (DRINK_WATER, CALL_HELP, etc.) stay. Add 4–6 contact-rich multi-joint skills:
- TurnOver — rolling a patient onto their side: mocap-driven patient model + safety force envelope
- Feed / HandMedicine — precise grasp of spoon or pill cup, alignment to mouth or hand
- WipeMouth / StraightenBlanket — repeated-contact surface tasks
- CompanionChat — LLM-generated natural-language response + TTS, with optional lip-sync on a simple face mesh as a stretch goal
Add finer sub-indicators to MQA:
- Contact force vs. envelope (bounded against the spec declared in TaskSpec)
- Grasp margin (distance from closure point to object surface)
- Trajectory smoothness (jerk integral)
3. More content
Content splits into three tracks:
Success paths (8–10): each one a full "utterance → L0–L5 readouts → robot action → post-assessment log" demo. Coverage spans simple loops like DRINK_WATER and CALL_HELP through contact-rich tasks like Feed and TurnOver.
Failure-negotiation narratives (3–5): e.g. occluded target, grasp slippage, low-confidence ITA intent. Each one shows three stages:
- Detection (which factor flagged the problem first)
- Negotiation (system requests confirmation, or offers an alternate object)
- Graceful degradation (fall back to a safer skill, or hand off to a human)
Non-signal-path bypass channels:
- Physical E-stop button inside the scene (visible, clickable)
- Voice keyword "STOP" intercept at L0 (ASR pathway, independent from the LLM parser)
- Eye-closure / head-tilt visual cue (UI stubbed for now, wired in later)
4. Video delivery
V0.4 delivery leads with offline, locally-rendered video rather than a live demo:
- Record 5–8 thematic videos of 3–5 minutes each (grouped by success / failure / bypass content above)
- Render locally to MP4, publish to GitHub Releases, mirror to YouTube (unlisted) and soma.jeffliulab.com
- Cut a 1–2 minute highlight reel for the homepage hero
The original Hetzner live demo (89.167.35.145) stays accessible but is no longer the primary showcase.
Where ANIMA fits
SOMA Care depends directly on ANIMA v0.1.0 — the first reference implementation that takes anima from a skeleton to a usable framework. All six layers (L0 Signal / L1 Parser / L2 Planner / L3 Skill / L4 Adapter / L5 Assessment) now have working code; the five factors (ITA / MQA / SQA / GOA / PEA) have concrete formulas; LLM-as-Parser with forced tool-calling is wired up. soma-care consumes it as a pip dependency (anima @ git+https://github.com/jeffliulab/anima.git@main) rather than embedding a copy.
This also means SOMA Care and SOMA Arm share the same cognition stack: Care runs nursing intents inside the simulated ward, Arm runs chess-board manipulation on real hardware — both on top of ANIMA.
V0.5+ stays uncommitted
V0.5 direction stays open until V0.4 videos ship and a round of feedback comes back. Candidate directions include non-invasive BCI (EEG / fNIRS) integration in simulation, real-hardware migration onto a physical Stretch RE3, and a clinical partner pilot — each requires external hardware, funding, or a compliance channel, and none are committed today.