Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners

Large language models (LLMs) showcase many desirable traits for intelligent and helpful robots. However, they are also known to hallucinate predictions. This issue is exacerbated in consumer robotics where LLM hallucinations may result in robots confidently executing plans that are contrary to user goals, relying more frequently on human assistance, or preventing the robot from asking for help at all. In this work, we present LAP, a novel approach for utilizing off-the-shelf LLM's, alongside scene and object Affordances, in robotic Planners that minimize harmful hallucinations and know when to ask for help. Our key finding is that calculating and leveraging a scene affordance score, a measure of whether a given action is possible in the provided scene, helps to mitigate hallucinations in LLM predictions and better align the LLM's confidence measure with the probability of success. We specifically propose and test three different affordance scores, which can be used independently or in tandem to improve performance across different use cases. The most successful of these individual scores involves prompting an LLM to determine if a given action is possible and safe in the given scene and uses the LLM's response to compute the score. Through experiments in both simulation and the real world, on tasks with a variety of ambiguities, we show that LAP significantly increases success rate and decreases the amount of human intervention required relative to prior art. For example, in our real-world testing paradigm, LAP decreases the human help rate of previous methods by over 33% at a success rate of 70%.

翻译：大语言模型（LLM）在赋能智能且乐于助人的机器人方面展现出诸多令人向往的特性。然而，它们也以产生预测幻觉而闻名。这一问题在消费级机器人领域尤为突出：LLM的幻觉可能导致机器人自信地执行与用户目标相悖的计划、更频繁地依赖人类协助，甚至完全阻止机器人主动寻求帮助。在本研究中，我们提出LAP——一种新颖方法，旨在利用现成的LLM以及场景与物体的可负担性，构建能最小化有害幻觉并知道何时求助的机器人规划器。我们的关键发现是：计算并利用场景可负担性得分（即衡量给定动作在所处场景中是否可行的指标），有助于缓解LLM预测中的幻觉，并使LLM的置信度与成功概率更精准对齐。我们具体提出并测试了三种不同的可负担性得分，这些得分可独立或联合使用，以在不同用例中提升性能。其中效果最优的单一得分通过提示LLM来判断给定动作在场景中是否可行且安全，并利用LLM的响应计算得分。通过在模拟环境与真实世界中对各类含歧义任务开展的实验，我们证明：相较于现有技术，LAP显著提高了成功率，并减少了所需的人类干预次数。例如，在真实世界测试范式中，当成功率达到70%时，LAP将先前方法的人类求助率降低了超过33%。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日