Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.
翻译:在大规模互联网数据上训练的基础模型(如大语言模型)具备零样本泛化能力,这使其成为检测和缓解机器人系统分布外故障模式的前沿技术。然而,要完全实现这一潜力需解决两大挑战:(一)降低此类模型的高昂计算开销以实现在线部署;(二)将其对潜在异常的判断纳入安全控制框架。本研究提出一个两阶段推理框架:第一阶段是快速二元异常分类器,可在LLM嵌入空间中分析观测数据,若检测到异常则触发第二阶段——利用生成式LLM推理能力的慢速回退选择模块。这两个阶段对应模型预测控制策略中的分支节点,该策略在检测到异常时立即通过维持各回退路径的联合可行性来补偿慢速推理器的延迟,从而确保安全性。实验表明,即使采用相对小型语言模型实例化,我们的快速异常分类器性能仍优于使用最先进GPT模型的自回归推理方法。这使得我们的运行时监控器能够在资源与时间约束下,提升四旋翼飞行器或自动驾驶车辆等动态机器人系统的可信度。本方法在仿真与真实场景的实验视频详见项目页面:https://sites.google.com/view/aesop-llm。