Chameleon: Control-Indexed Prospective Memory for Visuomotor Manipulation

Robots often observe information that determines a future action long before that action is executed. In a shell game, for example, a robot first sees which cup hides the ball, watches the cups move, and only later needs to choose the correct cup. The final observation alone is not enough for a decision: the correct action depends on an earlier event. We refer to this temporal gap as observation-action delay. It makes memory a policy-facing problem: a policy must keep similar histories distinct, retrieve the past event relevant to the current decision, and convert that recall into an action-ready state. We call these requirements separability, addressability, and prospectiveness. We introduce Chameleon, a ~60M visuomotor policy for control-indexed prospective memory. Chameleon writes embodied event memory, preserves separable histories, retrieves control-relevant traces, and trains the resulting working state to be prospective. We also introduce Camo-Dataset, a real-robot benchmark that isolates observation-action delay by making the decision scene visually ambiguous, so the correct action must be inferred from earlier observations. Chameleon improves decision/end-to-end success on Camo-Dataset from 22.5%/21.3% to 80.8%/71.3%. On public long-horizon memory benchmarks, it achieves 87.1% +/- 0.8% on LIBERO-10, 97.3% +/- 4.5% on MemoryBench, and 75.1% +/- 1.4% on MIKASA-Robo, setting the state of the art for same-size models and exceeding multiple larger VLA baselines under the reported protocols. Probes and ablations show that Chameleon learns separable, addressable, and prospective memory, and that these properties drive its performance gains.

翻译：机器人常常在执行某项动作之前很早便观察到决定该动作的信息。例如，在“三杯猜球”游戏中，机器人首先看到哪个杯子藏有球，观察杯子移动，之后才需要选择正确的杯子。仅凭最终观察不足以做出决策：正确动作取决于更早的事件。我们将这种时间间隔称为“观察-动作延迟”。这使得记忆成为一个策略层面的问题：策略必须保持相似历史记录的区分性，检索与当前决策相关的过去事件，并将该回忆转化为可执行动作的状态。我们将这些要求分别称为“可分离性”、“可寻址性”和“前瞻性”。我们提出了Chameleon——一个约60M参数的视觉运动策略，用于实现控制索引的前瞻记忆。Chameleon能够写入具身事件记忆、保存可分离的历史记录、检索控制相关的痕迹，并训练生成的工作状态具备前瞻性。我们还引入了Camo-Dataset，这是一个真实机器人基准测试，通过使决策场景视觉模糊来隔离观察-动作延迟，从而要求必须从早期观察推断正确动作。在该数据集上，Chameleon将决策/端到端成功率从22.5%/21.3%提升至80.8%/71.3%。在公开的长时记忆基准测试中，它在LIBERO-10上达到87.1%±0.8%，在MemoryBench上达到97.3%±4.5%，在MIKASA-Robo上达到75.1%±1.4%，在同等规模模型中达到最先进水平，并在报告协议下超越了多个更大的VLA基线模型。探针实验和消融研究表明，Chameleon学会了可分离、可寻址且具有前瞻性的记忆，而这些特性正是其性能提升的关键驱动力。