Social dilemmas are situations where groups of individuals can benefit from mutual cooperation but conflicting interests impede them from doing so. This type of situations resembles many of humanity's most critical challenges, and discovering mechanisms that facilitate the emergence of cooperative behaviors is still an open problem. In this paper, we study the behavior of self-interested rational agents that learn world models in a multi-agent reinforcement learning (RL) setting and that coexist in environments where social dilemmas can arise. Our simulation results show that groups of agents endowed with world models outperform all the other tested ones when dealing with scenarios where social dilemmas can arise. We exploit the world model architecture to qualitatively assess the learnt dynamics and confirm that each agent's world model is capable to encode information of the behavior of the changing environment and the other agent's actions. This is the first work that shows that world models facilitate the emergence of complex coordinated behaviors that enable interacting agents to ``understand'' both environmental and social dynamics.
翻译:社会困境是指群体个体可以通过相互合作获益,但利益冲突阻碍其实现合作的情境。这类情境与人类许多最关键的挑战相似,而发现促进合作行为涌现的机制仍是一个开放性问题。本文研究了在多智能体强化学习(RL)环境中学习世界模型的自利理性智能体的行为,这些智能体共存于可能产生社会困境的环境。我们的仿真结果表明,在面对可能产生社会困境的场景时,配备世界模型的智能体群体在所有测试方法中表现最优。我们利用世界模型架构对学习到的动态进行定性评估,并确认每个智能体的世界模型能够编码环境变化及其他智能体行为的信息。这是首项工作表明世界模型有助于涌现复杂的协调行为,使交互智能体能够“理解”环境和社会动态。