With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.
翻译:在庞大的状态-动作空间中,高效的多智能体探索始终是强化学习领域长期存在的挑战。尽管追求新颖性、多样性或不确定性日益受到关注,但缺乏适当引导选择的探索所带来的冗余努力,已成为该领域面临的实际问题。本文提出一种系统性方法,称为LEMAE,其核心在于选择性地从知识渊博的大语言模型(LLM)中获取信息丰富的任务相关引导,以实现高效的多智能体探索。具体而言,我们以低LLM推理成本,通过判别式方法将LLM的语言知识锚定至对任务完成至关重要的符号化关键状态。为释放关键状态的引导潜力,我们设计了基于子空间的后见内在奖励(SHIR),通过增加奖励密度来引导智能体朝向关键状态。此外,我们构建了关键状态记忆树(KSMT),以追踪特定任务中关键状态之间的转移,从而实现有组织的探索。得益于对冗余探索的大幅削减,LEMAE在具有挑战性的基准测试(如SMAC和MPE)上以显著优势超越现有SOTA方法,在某些场景中实现了10倍的加速。