Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io
翻译:大型语言模型展现出从逐步规划到常识推理等一系列有前景的能力,可能为机器人提供实用价值,但依然容易产生过度自信的幻觉式预测。本研究提出KnowNo框架,用于测量和校准基于大语言模型的规划器的不确定性,使其能识别自身知识边界并在必要时主动寻求帮助。该框架基于共形预测理论构建,在复杂多步骤规划场景中,能在保证任务完成统计可靠性的同时最小化人类干预需求。在涉及不同模糊模式(包括空间不确定性、数值不确定性、人类偏好歧义及Winograd模式等)的多种仿真与实物机器人实验表明,相较于需集成学习或大规模提示调优的现有主流方法,KnowNo在提升效率与自主性方面表现更优,且能提供形式化保证。该框架可直接应用于现成的大语言模型而无需微调,为建模不确定性提供了轻量化解决方案,可与基础模型日益增强的能力互补并共同扩展。项目网站:https://robot-help.github.io