Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations. Traditional learning-based approaches face two challenges. Firstly, natural language instructions often lack explicit task planning. Secondly, extensive training is required to equip models with knowledge of the task environment. Previous works based on Large Language Model (LLM) either suffer from poor performance due to the lack of task-specific knowledge or rely on ground truth as few-shot samples. To address the above limitations, we propose a novel approach called Progressive Retrieval Augmented Generation (P-RAG), which not only effectively leverages the powerful language processing capabilities of LLMs but also progressively accumulates task-specific knowledge without ground-truth. Compared to the conventional RAG methods, which retrieve relevant information from the database in a one-shot manner to assist generation, P-RAG introduces an iterative approach to progressively update the database. In each iteration, P-RAG retrieves the latest database and obtains historical information from the previous interaction as experiential references for the current interaction. Moreover, we also introduce a more granular retrieval scheme that not only retrieves similar tasks but also incorporates retrieval of similar situations to provide more valuable reference experiences. Extensive experiments reveal that P-RAG achieves competitive results without utilizing ground truth and can even further improve performance through self-iterations.
翻译:具身日常任务是具身人工智能领域的一项热门任务,要求智能体根据自然语言指令和视觉观察生成一系列动作。传统基于学习的方法面临两大挑战:其一,自然语言指令通常缺乏明确的任务规划;其二,需要大量训练才能使模型掌握任务环境知识。先前基于大语言模型的研究或因缺乏任务特定知识导致性能不佳,或依赖真实标注数据作为少样本示例。为克服上述局限,本文提出一种名为渐进式检索增强生成的新方法,该方法不仅能有效利用大语言模型强大的语言处理能力,还能在无需真实标注的情况下渐进积累任务特定知识。相较于传统RAG方法从数据库中单次检索相关信息以辅助生成,P-RAG引入迭代机制逐步更新数据库。在每次迭代中,P-RAG检索最新数据库并获取先前交互的历史信息,作为当前交互的经验参考。此外,我们还提出更细粒度的检索方案,不仅检索相似任务,还引入相似情境检索以提供更具价值的参考经验。大量实验表明,P-RAG在不使用真实标注数据的情况下取得了具有竞争力的结果,甚至能通过自我迭代进一步提升性能。