Large language models (LLMs) have grown in popularity due to their natural language interface and pre trained knowledge, leading to rapidly increasing success in question-answering (QA) tasks. More recently, multi-agent systems with LLM-based agents (Multi-LLM) have been utilized increasingly more for QA. In these scenarios, the models may each answer the question and reach a consensus or each model is specialized to answer different domain questions. However, most prior work dealing with Multi-LLM QA has focused on scenarios where the models are asked in a zero-shot manner or are given information sources to extract the answer. For question answering of an unknown environment, embodied exploration of the environment is first needed to answer the question. This skill is necessary for personalizing embodied AI to environments such as households. There is a lack of insight into whether a Multi-LLM system can handle question-answering based on observations from embodied exploration. In this work, we address this gap by investigating the use of Multi-Embodied LLM Explorers (MELE) for QA in an unknown environment. Multiple LLM-based agents independently explore and then answer queries about a household environment. We analyze different aggregation methods to generate a single, final answer for each query: debating, majority voting, and training a central answer module (CAM). Using CAM, we observe a $46\%$ higher accuracy compared against the other non-learning-based aggregation methods. We provide code and the query dataset for further research.
翻译:大型语言模型(LLM)因其自然语言接口和预训练知识而日益普及,在问答(QA)任务中取得了迅速增长的成功。最近,基于LLM智能体的多智能体系统(Multi-LLM)在问答任务中的应用也日益增多。在这些场景中,模型可以各自回答问题并达成共识,或者每个模型专门回答不同领域的问题。然而,先前大多数关于多LLM问答的研究都集中在零样本提问或提供信息源以提取答案的场景中。对于未知环境的问答,首先需要对环境进行具身探索才能回答问题。这种技能对于将具身人工智能个性化应用于家庭等环境是必要的。目前尚缺乏关于多LLM系统能否基于具身探索的观察进行问答的深入见解。在本工作中,我们通过研究多具身LLM探索器(MELE)在未知环境中的问答应用来填补这一空白。多个基于LLM的智能体独立探索家庭环境,然后回答相关查询。我们分析了不同的聚合方法以生成每个查询的单一最终答案:辩论、多数投票以及训练中央答案模块(CAM)。使用CAM时,我们观察到其准确率比其他非基于学习的聚合方法高出$46\%$。我们提供了代码和查询数据集以供进一步研究。