Multi-agent deep reinforcement learning (MADRL) problems often encounter the challenge of sparse rewards. This challenge becomes even more pronounced when coordination among agents is necessary. As performance depends not only on one agent's behavior but rather on the joint behavior of multiple agents, finding an adequate solution becomes significantly harder. In this context, a group of agents can benefit from actively exploring different joint strategies in order to determine the most efficient one. In this paper, we propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. We present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation method that follows the centralized learning with decentralized execution paradigm. JIM rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. We demonstrate the strengths of this approach both in a synthetic environment designed to reveal shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
翻译:多智能体深度强化学习(MADRL)问题常面临稀疏奖励的挑战。当智能体间需要协作时,这一挑战尤为突出。由于性能不仅取决于单一智能体的行为,还依赖于多个智能体的联合行为,寻找合适的解决方案变得更加困难。在此背景下,智能体群体可通过主动探索不同联合策略来寻求最优方案。本文提出一种为智能体集体展现新颖行为策略提供奖励的方法。我们提出了JIM(联合内在动机),一种遵循集中学习与分散执行范式的多智能体内在动机方法。JIM基于专为连续环境设计的集中式新颖性度量机制,对联合轨迹进行奖励。我们在用于揭示现有最优MADRL方法缺陷的合成环境以及仿真机器人任务中验证了该方法的优势。实验结果表明,当最优策略需要高度协调时,联合探索对解决此类任务至关重要。