Vision-language-action (VLA) policies can deviate from nominal trajectories during manipulation, even when tasks remain physically feasible. Recovering from these deviations is challenging, as they push the policy into unfamiliar state spaces where direct re-planning frequently destabilizes action sequences. We propose Back to the Familiar Future (B2FF), a recovery framework for foresight-driven VLAs that leverages future visual conditioning as a recovery interface. Before execution, the VLA generates a milestone bank of familiar future states conditioned on the clean initial observation. At recovery time, a recoverability-aware selector selects a recovery milestone from this bank and enforces it as a fixed visual goal. This enables the VLA to robustly map off-trajectory observations back to a familiar future. On failure-injected LIBERO, under controlled recovery timing aligned with the injected failure, B2FF increases the average success rate of a baseline VLA from 56.3% to 74.0%, demonstrating that pre-imagined milestones can guide recovery without fine-tuning the low-level action generator.
翻译:视觉-语言-动作(VLA)策略在执行操作任务时可能偏离标称轨迹,即使任务在物理上仍是可行的。从这些偏离中恢复具有挑战性,因为这会迫使策略进入不熟悉的状态空间,而直接重新规划常常会破坏动作序列的稳定性。我们提出“回到熟悉的未来”(B2FF)框架,这是一种面向前瞻性VLA的恢复方法,利用未来视觉条件作为恢复接口。在执行前,VLA基于干净的初始观测生成一个由熟悉未来状态组成的里程碑库。在恢复时刻,一个可恢复性感知的选择器从该库中选取一个恢复里程碑,并将其作为固定的视觉目标强制执行。这使得VLA能够将偏离轨迹的观测鲁棒地映射回熟悉的未来状态。在注入故障的LIBERO基准测试中,当控制恢复时机与注入故障对齐时,B2FF将基线VLA的平均成功率从56.3%提升至74.0%,证明了无需微调底层动作生成器,预想象里程碑即可引导恢复。