Centralized training with decentralized execution (CTDE) is a widely-used learning paradigm that has achieved significant success in complex tasks. However, partial observability issues and the absence of effectively shared signals between agents often limit its effectiveness in fostering cooperation. While communication can address this challenge, it simultaneously reduces the algorithm's practicality. Drawing inspiration from human team cooperative learning, we propose a novel paradigm that facilitates a gradual shift from explicit communication to tacit cooperation. In the initial training stage, we promote cooperation by sharing relevant information among agents and concurrently reconstructing this information using each agent's local trajectory. We then combine the explicitly communicated information with the reconstructed information to obtain mixed information. Throughout the training process, we progressively reduce the proportion of explicitly communicated information, facilitating a seamless transition to fully decentralized execution without communication. Experimental results in various scenarios demonstrate that the performance of our method without communication can approaches or even surpasses that of QMIX and communication-based methods.
翻译:集中式训练与分散式执行(CTDE)是一种广泛使用的学习范式,在复杂任务中取得了显著成功。然而,部分可观测性问题和智能体之间缺乏有效共享信号,常常限制了其在促进合作方面的有效性。虽然通信可以解决这一挑战,但同时也降低了算法的实用性。受人类团队合作学习的启发,我们提出了一种新范式,促进从显式通信到隐式合作的逐步过渡。在初始训练阶段,我们通过智能体之间共享相关信息来促进合作,并同时利用每个智能体的局部轨迹重构这些信息。然后,我们将显式通信的信息与重构信息相结合,得到混合信息。在整个训练过程中,我们逐步降低显式通信信息的比例,促进无缝过渡到无需通信的完全分散式执行。多种场景下的实验结果表明,我们方法在没有通信的情况下的性能可以接近甚至超越QMIX和基于通信的方法。