Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

翻译：在庞大的状态-动作空间中，高效的多智能体探索始终是强化学习领域长期存在的挑战。尽管追求新颖性、多样性或不确定性日益受到关注，但缺乏适当引导选择的探索所带来的冗余努力，已成为该领域面临的实际问题。本文提出一种系统性方法，称为LEMAE，其核心在于选择性地从知识渊博的大语言模型（LLM）中获取信息丰富的任务相关引导，以实现高效的多智能体探索。具体而言，我们以低LLM推理成本，通过判别式方法将LLM的语言知识锚定至对任务完成至关重要的符号化关键状态。为释放关键状态的引导潜力，我们设计了基于子空间的后见内在奖励（SHIR），通过增加奖励密度来引导智能体朝向关键状态。此外，我们构建了关键状态记忆树（KSMT），以追踪特定任务中关键状态之间的转移，从而实现有组织的探索。得益于对冗余探索的大幅削减，LEMAE在具有挑战性的基准测试（如SMAC和MPE）上以显著优势超越现有SOTA方法，在某些场景中实现了10倍的加速。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日