Learning-based manipulation policies have made substantial progress in real-world robot manipulation, particularly for short-horizon action generation. However, deployment in open workspaces remains fragile under unexpected local scene dynamics, such as moving objects, transient occlusions, or disturbances near the intended motion. Existing runtime monitors often rely on global observation anomalies, policy uncertainty, or frame-level visual changes, and struggle to distinguish task-relevant execution risk from benign visual variation. We introduce PATCH, an action-chunk-conditioned latent patch innovation monitor for deployment-time intervention. Given the active action chunk, PATCH defines a projected execution corridor, predicts latent patch evolution inside it, and accumulates persistent residuals unexplained by the robot's own motion. These residuals form a localized intervention signal that allows PATCH-Router to pause execution, select an available recovery source, and resume the original policy once localized innovation subsides. Experiments on real robot rollout data show that PATCH produces more stable and context-relevant triggers than competing runtime monitors. Real-robot deployment further demonstrates monitor-driven intervention and policy resumption for disturbance-aware manipulation. Project Page: https://yananzhou5555.github.io/PATCH/.
翻译:摘要:基于学习的操作策略在真实机器人操作中取得了显著进展,尤其在短视界动作生成方面。然而,在开放工作空间中部署时,面对意外的局部场景动态(如移动物体、短暂遮挡或目标运动附近的干扰)仍显脆弱。现有运行时监控方法通常依赖全局观测异常、策略不确定性或帧级视觉变化,难以区分与任务相关的执行风险与良性视觉变化。我们提出PATCH——一种基于动作块条件潜块创新的部署时干预监控方法。给定当前动作块,PATCH定义投影执行走廊,预测其内部潜块演化,并累积机器人自身运动无法解释的持续残差。这些残差形成局部化干预信号,使PATCH-Router能够暂停执行、选择可用恢复源,并在局部化创新消退后恢复原始策略。在真实机器人 rollout 数据上的实验表明,PATCH产生的触发信号比竞争性运行时监控方法更稳定且与上下文相关。真实机器人部署进一步展示了基于监控驱动的干预和策略恢复,用于干扰感知操作。项目页面:https://yananzhou5555.github.io/PATCH/。