Robots solving generalist tasks need to be able to ground instructions in their past experience, since humans may refer to notable past events when giving a task (e.g., ``Take me to where the chemical spill happened yesterday''). Since memory limits make storing all past events infeasible, long-term robot memory must be selective, ideally retaining only those episodes with high utility for future tasks. However, future tasks are not typically given a priori for generalist robots. To select generically useful memories, we propose Bayesian surprise as a gating mechanism for memory formation. We present an approach to compute surprise in a semantically rich deployment-agnostic latent space provided by V-JEPA-2. Using our gated episodic memory to augment 4D scene graph-based spatial memory, we show a consistent improvement over state-of-the-art benchmarks in robot question answering, outperforming prior robot memory methods by $\geq12\%$ for temporal, spatial, and binary questions, and surpassing the performance of supervised and non-causal methods with an unsupervised causal method in event segmentation tasks.
翻译:解决通用任务的机器人需要能够将指令与自身过去的经验关联起来,因为人类在给出任务时可能会提及显著的历史事件(例如,“带我去昨天化学品泄漏的地方”)。由于记忆容量限制使得存储所有过去事件不可行,长期机器人记忆必须具有选择性,理想情况下仅保留那些对未来任务具有高实用价值的情节。然而,对于通用机器人而言,未来任务通常无法预先设定。为了选择具有通用实用价值的记忆,我们提出将贝叶斯惊奇作为记忆形成的门控机制。我们提出了一种方法,在由V-JEPA-2提供的语义丰富且部署无关的潜在空间中计算惊奇度。通过使用我们的门控情景记忆增强基于4D场景图的语义空间记忆,我们在机器人问答任务中相较于当前最优基准方法取得了一致改进,在时间、空间和二元问题上的表现比先前的机器人记忆方法高出≥12%,并在事件分割任务中,以无监督因果方法超越了采用监督方法和非因果方法的性能。