With strong capabilities of reasoning and a broad understanding of the world, Large Language Models (LLMs) have demonstrated immense potential in building versatile embodied decision-making agents capable of executing a wide array of tasks. Nevertheless, when deployed in unfamiliar environments, we show that LLM agents encounter challenges in efficiently gathering essential information, leading to suboptimal performance. Conversely, human individuals often seek additional information from their peers prior to taking action, harnessing external knowledge to avoid unnecessary trial and error. Drawing inspiration from this behavior, we propose \textit{Asking Before Acting} (ABA), a method that empowers the agent to proactively inquire with external sources for pertinent information using natural language during their interactions within the environment. In this way, the agent is able to enhance its efficiency and performance by circumventing potentially laborious steps and combating the difficulties associated with exploration in unfamiliar environments and vagueness of the instructions. We conduct extensive experiments involving a spectrum of environments including text-based household everyday tasks, robot arm manipulation tasks, and real world open domain image based embodied tasks. The experiments involve various models from Vicuna to GPT-4. The results demonstrate that, even with modest prompts modifications, ABA exhibits substantial advantages on both performance and efficiency over baseline LLM agents. Further finetuning ABA with reformulated metadata (ABA-FT) faciliates learning the rationale for asking and allows for additional enhancements especially in tasks that baselines struggle to solve.
翻译:大型语言模型(LLMs)凭借强大的推理能力和广泛的世界知识,在构建能够执行多样化任务的通用具身决策智能体方面展现出巨大潜力。然而,本研究发现,当LLM智能体部署于陌生环境时,其在高效收集关键信息方面存在挑战,导致性能欠佳。相比之下,人类个体往往在行动前会向同伴寻求额外信息,借助外部知识避免不必要的试错。受此行为启发,我们提出“问而后行”(Asking Before Acting, ABA)方法——一种使智能体在与环境交互过程中能够主动以自然语言向外部来源询问相关信息的方法。通过这种方式,智能体可绕过潜在繁琐步骤,克服陌生环境探索困难与指令模糊性带来的挑战,从而提升效率与性能。我们在涵盖文本化家庭日常任务、机械臂操作任务及真实世界开放域图像具身任务的多种环境中开展大量实验,涉及Vicuna至GPT-4等不同模型。结果表明,即便仅通过适度的提示调整,ABA在性能与效率上均较基线LLM智能体展现出显著优势。进一步利用重构元数据微调ABA(ABA-FT)可促进询问逻辑的学习,尤其在基线模型难以解决的任务中实现额外性能提升。