In multi-agent reinforcement learning (MARL), the centralized training with decentralized execution (CTDE) framework has gained widespread adoption due to its strong performance. However, the further development of CTDE faces two key challenges. First, agents struggle to autonomously assess the relevance of input information for cooperative tasks, impairing their decision-making abilities. Second, in communication-limited scenarios with partial observability, agents are unable to access global information, restricting their ability to collaborate effectively from a global perspective. To address these challenges, we introduce a novel cooperative MARL framework based on information selection and tacit learning. In this framework, agents gradually develop implicit coordination during training, enabling them to infer the cooperative behavior of others in a discrete space without communication, relying solely on local information. Moreover, we integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes, thereby enhancing their decision-making capabilities. Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms, leading to significant performance improvements.
翻译:在多智能体强化学习(MARL)中,集中训练分散执行(CTDE)框架因其卓越性能而得到广泛应用。然而,CTDE的进一步发展面临两大关键挑战。首先,智能体难以自主评估输入信息与合作任务的相关性,从而削弱其决策能力。其次,在具有部分可观测性的通信受限场景中,智能体无法获取全局信息,限制了其从全局视角进行有效协作的能力。为应对这些挑战,我们提出了一种基于信息选择与默契学习的新型合作MARL框架。在该框架中,智能体在训练过程中逐步形成隐式协调,使其能够在离散空间中仅依赖局部信息、无需通信即可推断其他智能体的合作行为。此外,我们引入了门控与选择机制,使智能体能够根据环境变化自适应地过滤信息,从而提升其决策能力。在主流MARL基准测试上的实验表明,本框架可与最先进算法无缝集成,并带来显著的性能提升。