ANIMA V0.0 — Skeleton Era: Separating the Brain from the Body

This log records the skeleton-era state before the 2026-04-21 absorption of the reference implementation: module boundaries were drawn and the public narrative was locked, but there was little runnable code in the repository yet. For the current version (v0.1.0 first reference implementation), see the v0.1 entry.

Most robotics projects put language understanding, task planning, and hardware control in the same codebase. It feels natural at first -- everything runs on the same machine anyway. But the moment you try to move that cognition to a different robot, you discover that the language parsing logic is entangled with a specific arm's joint limits, and nothing can be cleanly extracted.

ANIMA was designed from day one to avoid this trap. It is a standalone cognition framework -- its own repository, its own Python library, not living inside any robot project's subdirectory. SOMA Arm is its current reference implementation carrier, but ANIMA itself has no dependency on SOMA's hardware specifics. If the robot were swapped out tomorrow for something entirely different, ANIMA's parser, planner, and validator should carry over directly -- only a new skill adapter layer would need to be written.

Four Core Modules

ANIMA's architecture is organized around four modules, each with clear input/output boundaries:

Parser takes a natural-language instruction and outputs a structured TaskSpec. The key design choice is LLM-as-Parser rather than LLM-as-Translator -- the LLM does not generate robot commands directly, but compresses open language into an inspectable, traceable intermediate structure. The benefit is that every downstream step can be audited rather than blindly trusting the LLM's output.

Planner takes the TaskSpec and orchestrates task decomposition and execution through behavior trees (py_trees). Behavior trees natively support conditional branching, retry, and fallback, making them better suited to real-world uncertainty than linear command sequences.

Skill Registry is the interface layer between the cognition stack and the robot body. It defines which skill primitives are callable (e.g., pick, place, push) and what preconditions and expected effects each skill carries. The framework only invokes skills through the registry -- it never touches joint angles, serial protocols, or other hardware-level details directly.

Validator re-observes the world state after each skill execution to verify whether the outcome matches expectations. If a pick action claims success, the validator uses vision to check whether the target object actually left its original position. Verification failure triggers retry, rollback, or natural-language feedback -- the system does not silently pretend success.

Why Not Just Use Stockfish

ANIMA's design includes a modular game engine architecture. The first plugin to be implemented is a chess rule engine responsible for legal move generation and capture detection. A natural question arises: with mature chess engines like Stockfish available, why build anything custom?

The answer lies in a difference of goals. ANIMA's game engine framework is not about making the robot play optimal chess -- that is a solved problem. It is about building a general board-game interface layer: given the current board state and a natural-language instruction, determine whether the requested operation is legal, generate the list of pieces that need to move, and detect whether a capture is triggered. This interface should work for chess, Chinese chess, Go, or any other board game -- just swap in a different rule plugin.

Stockfish only understands optimal chess strategy. ANIMA needs a framework that understands "what does the user want to do" and judges "is this legal under the rules."

Architecturally, the game engine is designed as a plugin system: the core framework defines interfaces for board state representation, move generation, and outcome evaluation. Each specific game type only needs to implement these three interfaces to plug in. The chess rule engine is the first plugin, but the framework itself makes no assumption that chess is the only game.

The Boundary Between Framework and Robot

A large portion of V1.0's effort went into something that does not produce runnable code: drawing a clean boundary between what belongs to ANIMA and what belongs to the robot.

What belongs to the framework? Language parsing, task structure, execution validation, skill registration -- these are independent of which physical arm is being used.

What belongs to the robot? Joint parameters, sensor calibration, motion planning constraints, gripper physics -- these are hardware-specific and differ from one robot to the next.

In SOMA Arm's codebase, this boundary manifests as a thin ROS 2 wrapper node: it translates ANIMA's Python API calls into ROS topics and services, but ANIMA itself does not import any ROS dependencies. The practical benefit is that ANIMA can be developed and tested in a pure Python environment without spinning up a full ROS 2 workspace every time.

Why a Separate Repository

Splitting the cognition framework into its own repository is not just about code tidiness. It solves a more practical problem: keeping two workflows with very different cadences from interfering with each other.

SOMA Arm's development pace is intense -- drivers, serial ports, and camera debugging happen daily. Its code changes frequently, and most changes are hardware-specific. If ANIMA lived inside SOMA's subdirectory, every hardware-only refactor in SOMA would pollute ANIMA's commit history; conversely, every cognitive-layer interface change in ANIMA would leave irrelevant noise in the robot repository.

A separate repo means independent version history, independent release cadence, and independent dependency management. SOMA Arm calls ANIMA's Python API through a thin ROS 2 wrapper node, and the only contract between the two is the TaskSpec data format and the Skill Registry interface. This boundary is narrow enough that swapping in a different robot only requires rewriting the wrapper and skill adapter, without touching a single line of the framework itself.

Current Status

V1.0 is ANIMA's skeleton version. The boundaries of the four core modules are defined, the public narrative is locked, and the game engine's plugin architecture is designed. But to be candid, there is not much runnable code in the repository yet -- the parser's prompt engineering, the behavior tree's concrete node implementations, and the validator's visual checking logic have not been built out.

This is intentional. Before SOMA Arm's lower layers (driver, teleop, camera, calibration) are fully stable, building out ANIMA's implementation prematurely would only produce code that needs to be rewritten in two weeks. The current strategy is to let SOMA stabilize the hardware and perception layers first while ANIMA maintains clean interface definitions and architecture boundaries, then fill in the implementation rapidly when the integration window arrives.

What Comes Next

Once SOMA Arm completes its perception foundation (V1.01) and chess piece manipulation (V1.02), ANIMA will enter its real implementation phase: the parser needs to turn "capture the pawn on e4" into a TaskSpec, the chess engine needs to validate legality, the behavior tree needs to orchestrate the full capture sequence, and the validator needs to visually confirm that the piece was correctly removed. That is when ANIMA goes from skeleton to a cognition layer that actually drives a robot.

Tech Stack

Python 3.10 / Claude API (LLM-as-Parser) / py_trees / ROS 2 Humble (wrapper layer only) / Modular game engine architecture