Recent advances in large language models (LLMs) have led to significant progress in robotics, enabling embodied agents to better understand and execute open-ended tasks. However, existing approaches using LLMs face limitations in grounding their outputs within the physical environment and aligning with the capabilities of the robot. This challenge becomes even more pronounced with smaller language models, which are more computationally efficient but less robust in task planning and execution. In this paper, we present a novel modular architecture designed to enhance the robustness of LLM-driven robotics by addressing these grounding and alignment issues. We formalize the task planning problem within a goal-conditioned POMDP framework, identify key failure modes in LLM-driven planning, and propose targeted design principles to mitigate these issues. Our architecture introduces an ``expected outcomes'' module to prevent mischaracterization of subgoals and a feedback mechanism to enable real-time error recovery. Experimental results, both in simulation and on physical robots, demonstrate that our approach significantly improves task success rates for pick-and-place and manipulation tasks compared to both larger LLMs and standard baselines. Through hardware experiments, we also demonstrate how our architecture can be run efficiently and locally. This work highlights the potential of smaller, locally-executable LLMs in robotics and provides a scalable, efficient solution for robust task execution.
翻译:近年来,大型语言模型(LLMs)的进展极大地推动了机器人领域的发展,使得具身智能体能够更好地理解和执行开放式任务。然而,现有基于LLMs的方法在将其输出与物理环境进行对接(grounding)以及与机器人实际能力对齐(alignment)方面仍存在局限。这一挑战在使用较小规模的语言模型时尤为突出,这些模型虽然计算效率更高,但在任务规划与执行方面的鲁棒性较差。本文提出了一种新颖的模块化架构,旨在通过解决上述对接与对齐问题,增强LLM驱动机器人系统的鲁棒性。我们在目标条件部分可观测马尔可夫决策过程(goal-conditioned POMDP)框架内形式化任务规划问题,识别LLM驱动规划中的关键故障模式,并提出针对性的设计原则以缓解这些问题。我们的架构引入了一个“预期结果”模块以防止子目标表征错误,以及一个反馈机制以实现实时错误恢复。在仿真和实体机器人上的实验结果表明,与更大规模的LLMs及标准基线方法相比,我们的方法在拾放和操作任务上的成功率显著提高。通过硬件实验,我们还展示了该架构如何能够高效地在本地运行。这项工作凸显了较小规模、可在本地执行的LLMs在机器人领域的潜力,并为鲁棒的任务执行提供了一个可扩展且高效的解决方案。