Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.
翻译:视觉导航任务对于家庭服务机器人至关重要。随着任务复杂性日益增加,多机器人间的有效沟通与协作成为确保任务成功完成的关键。近年来,大语言模型在具身智能体领域展现出卓越的理解与规划能力。然而,其在家庭场景中的应用,特别是多智能体通过通信协作完成复杂导航任务的研究,尚未得到充分探索。为此,本文提出一种去中心化的多智能体导航框架,利用大语言模型实现智能体间的通信与协作。通过设计基于通信触发的动态领导组织结构,我们以更少的通信次数实现更快的团队共识,从而获得更优的导航效果与协作探索效率。借助所提出的新型通信机制,我们的框架能够在多目标导航任务中实现无冲突且鲁棒的协作,即使在团队规模急剧扩大时仍能保持稳定性能。