With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.
翻译:在广阔的状态-动作空间中,高效的多智能体探索仍是强化学习领域的一项长期挑战。尽管追求新奇性、多样性或不确定性吸引了越来越多的关注,但缺乏适当引导选择的探索所带来的冗余努力,给该领域带来了实际难题。本文引入了一种系统方法,称为LEMAE,它选择从知识丰富的大语言模型(LLM)中引导信息性的任务相关指导,以实现高效的多智能体探索。具体而言,我们将LLM的语言知识以低推理成本、辨别性的方式,落地为对任务完成至关重要的符号关键状态。为释放关键状态的力量,我们设计了基于子空间的事后内在奖励(SHIR),通过增加奖励密度来引导智能体接近关键状态。此外,我们构建了关键状态记忆树(KSMT),用于跟踪特定任务中关键状态之间的转换,以实现有组织的探索。由于减少了冗余探索,LEMAE在具有挑战性的基准(如SMAC和MPE)上大幅超越了现有最先进方法,在某些场景中实现了10倍的加速。