Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.
翻译:在闭环机器人系统中,自动检测并预防开放集故障至关重要。现有研究往往难以在故障发生后进行反应式意外故障识别的同时,又能前瞻性地预防可预见的故障。为此,我们提出代码即监控(Code-as-Monitor, CaM)这一新颖范式,利用视觉语言模型(VLM)同时实现开放集反应式与前瞻式故障检测。本方法的核心在于将这两类任务统一建模为时空约束满足问题集,并通过VLM生成的代码对其进行实时监控评估。为提升监控的准确性与效率,我们进一步引入约束元素,将约束相关实体或其部件抽象为紧凑的几何元素。该方法具有更强的泛化性,简化了跟踪过程,并通过将这些元素作为视觉提示,促进了约束感知的可视化编程。实验表明,在三种仿真器及真实场景中,相较于基线方法,CaM在严重干扰下实现了28.7%的成功率提升,并将执行时间降低了31.8%。此外,CaM可与开环控制策略集成形成闭环系统,从而支持在动态环境的杂乱场景中执行长时程任务。