Current work on robot failure detection and correction typically operate in a post hoc manner, analyzing errors and applying corrections only after failures occur. This work introduces CycleVLA, a system that equips Vision-Language-Action models (VLAs) with proactive self-correction, the capability to anticipate incipient failures and recover before they fully manifest during execution. CycleVLA achieves this by integrating a progress-aware VLA that flags critical subtask transition points where failures most frequently occur, a VLM-based failure predictor and planner that triggers subtask backtracking upon predicted failure, and a test-time scaling strategy based on Minimum Bayes Risk (MBR) decoding to improve retry success after backtracking. Extensive experiments show that CycleVLA improves performance for both well-trained and under-trained VLAs, and that MBR serves as an effective zero-shot test-time scaling strategy for VLAs. Project Page: https://dannymcy.github.io/cyclevla/
翻译:当前关于机器人故障检测与校正的研究通常以事后方式进行,即在故障发生后才分析错误并实施校正。本文提出CycleVLA系统,该体系为视觉-语言-动作模型赋予主动自校正能力,使其能够在执行过程中预判初现的故障并在其完全显现前进行恢复。CycleVLA通过以下组件实现该能力:集成具备进度感知能力的VLA以标记故障最常发生的关键子任务转换点;基于视觉语言模型的故障预测与规划器,在预测到故障时触发子任务回溯;以及基于最小贝叶斯风险解码的测试时扩展策略,以提升回溯后重试的成功率。大量实验表明,CycleVLA能有效提升训练充分与欠训练的VLA性能,且MBR可作为VLA一种有效的零样本测试时扩展策略。项目主页:https://dannymcy.github.io/cyclevla/