Multi-agent reinforcement learning (MARL) is a prevalent learning paradigm for solving stochastic games. In most MARL studies, agents in a game are defined as teammates or enemies beforehand, and the relationships among the agents remain fixed throughout the game. However, in real-world problems, the agent relationships are commonly unknown in advance or dynamically changing. Many multi-party interactions start off by asking: who is on my team? This question arises whether it is the first day at the stock exchange or the kindergarten. Therefore, training policies for such situations in the face of imperfect information and ambiguous identities is an important problem that needs to be addressed. In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task. In the IDRL framework, a relation network is constructed to deduce the identities of other agents by observing the behaviors of the agents. A danger network is optimized to estimate the risk of false-positive identifications. Beyond that, we propose an intrinsic reward that balances the need to maximize external rewards and accurate identification. After identifying the cooperation-competition pattern among the agents, IDRL applies one of the off-the-shelf MARL methods to learn the policy. To evaluate the proposed method, we conduct experiments on Red-10 card-shedding game, and the results show that IDRL achieves superior performance over other state-of-the-art MARL methods. Impressively, the relation network has the par performance to identify the identities of agents with top human players; the danger network reasonably avoids the risk of imperfect identification. The code to reproduce all the reported results is available online at https://github.com/MR-BENjie/IDRL.
翻译:多智能体强化学习是解决随机博弈问题的常见学习范式。在大多数多智能体强化学习研究中,游戏中的智能体被预先定义为队友或敌人,且智能体之间的关系在整个游戏过程中保持不变。然而,在实际问题中,智能体关系通常事先未知或动态变化。许多多方交互始于一个疑问:谁是我的队友?无论是初入证券交易所的第一天,还是幼儿园的第一天,这个问题都会出现。因此,在不完全信息和模糊身份的情况下训练策略,是一个亟待解决的重要问题。本文提出了一种新型的身份检测强化学习框架,该框架允许智能体动态推断附近智能体的身份,并选择适当的策略完成任务。在IDRL框架中,我们构建了一个关系网络,通过观察智能体的行为来推断其他智能体的身份;同时优化了一个危险网络,以估计错误识别的风险。此外,我们还提出了一种内在奖励,以平衡最大化外部奖励与准确识别之间的需求。在识别出智能体间的合作-竞争模式后,IDRL采用现有的多智能体强化学习方法之一来学习策略。为评估所提方法,我们在Red-10出牌游戏上进行了实验,结果显示IDRL的性能优于其他最先进的多智能体强化学习方法。值得一提的是,关系网络在识别智能体身份方面与顶尖人类玩家表现相当;危险网络则合理规避了不完美识别的风险。复现所有报告结果的代码已在https://github.com/MR-BENjie/IDRL公开提供。