The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment.
翻译:对他人心智状态进行建模的能力对人类社交智能至关重要,并且可以为人工智能体在多智能体环境中引发的社交动态提供类似益处。我们提出一种方法,将语义上有意义且可被人类理解的信念嵌入由深度网络建模的策略中。接着,我们考虑二阶信念预测任务。我们提出,每个智能体预测其他智能体信念的能力可作为多智能体强化学习的内在奖励信号。最后,我们在一个混合合作-竞争环境中展示了初步的实验结果。