This paper addresses a mathematically tractable model of the Prisoner's Dilemma using the framework of active inference. In this work, we design pairs of Bayesian agents that are tracking the joint game state of their and their opponent's choices in an Iterated Prisoner's Dilemma game. The specification of the agents' belief architecture in the form of a partially-observed Markov decision process allows careful and rigourous investigation into the dynamics of two-player gameplay, including the derivation of optimal conditions for phase transitions that are required to achieve certain game-theoretic steady states. We show that the critical time points governing the phase transition are linearly related to each other as a function of learning rate and the reward function. We then investigate the patterns that emerge when varying the agents' learning rates, as well as the relationship between the stochastic and deterministic solutions to the two-agent system.
翻译:本文利用主动推理框架构建了一个数学上可处理的囚徒困境模型。我们设计了一组贝叶斯智能体对,它们在迭代囚徒困境博弈中追踪自身与对手选择的联合博弈状态。通过将智能体的信念架构规范为部分可观测马尔可夫决策过程,我们得以严谨地研究双人博弈的动力学特性,包括推导实现特定博弈论稳态所需相变的最优条件。研究表明,控制相变的关键时间点与学习率和奖励函数呈线性相关关系。进而探究了改变智能体学习率时涌现的博弈模式,以及双智能体系统随机解与确定性解之间的关联。