Exploration in cooperative multi-agent reinforcement learning (MARL) remains challenging for value-based agents due to the absence of an explicit policy. Existing approaches include individual exploration based on uncertainty towards the system and collective exploration through behavioral diversity among agents. However, the introduction of additional structures often leads to reduced training efficiency and infeasible integration of these methods. In this paper, we propose Adaptive exploration via Identity Recognition~(AIR), which consists of two adversarial components: a classifier that recognizes agent identities from their trajectories, and an action selector that adaptively adjusts the mode and degree of exploration. We theoretically prove that AIR can facilitate both individual and collective exploration during training, and experiments also demonstrate the efficiency and effectiveness of AIR across various tasks.
翻译:在基于价值的多智能体强化学习(MARL)中,由于缺乏显式策略,探索问题仍然具有挑战性。现有方法包括基于系统不确定性的个体探索,以及通过智能体间行为多样性实现的集体探索。然而,引入额外结构通常会导致训练效率降低,且这些方法难以有效整合。本文提出基于身份识别的自适应探索方法(AIR),它包含两个对抗性组件:一个从轨迹中识别智能体身份的分类器,以及一个自适应调整探索模式与程度的动作选择器。我们从理论上证明了AIR能够在训练过程中同时促进个体与集体探索,实验也验证了AIR在各种任务中的高效性与有效性。