This paper addresses a mathematically tractable model of the Prisoner's Dilemma using the framework of active inference. In this work, we design pairs of Bayesian agents that are tracking the joint game state of their and their opponent's choices in an Iterated Prisoner's Dilemma game. The specification of the agents' belief architecture in the form of a partially-observed Markov decision process allows careful and rigourous investigation into the dynamics of two-player gameplay, including the derivation of optimal conditions for phase transitions that are required to achieve certain game-theoretic steady states. We show that the critical time points governing the phase transition are linearly related to each other as a function of learning rate and the reward function. We then investigate the patterns that emerge when varying the agents' learning rates, as well as the relationship between the stochastic and deterministic solutions to the two-agent system.
翻译:本文利用主动推理框架,构建了一个数学上可处理的囚徒困境模型。我们设计了一对贝叶斯智能体,它们在迭代囚徒困境游戏中追踪自身与对手选择的联合博弈状态。通过部分可观测马尔可夫决策过程的形式化规范,智能体的信念架构得以精确描述,从而能够严谨地研究双人博弈的动态过程,包括推导实现特定博弈论稳态所需的相变最优条件。研究表明,控制相变的关键时间点与学习率和奖励函数呈线性相关。随后,我们探究了改变智能体学习率时涌现的模式,以及双智能体系统中随机解与确定性解之间的关系。