We study the evolution of behavior under reinforcement learning in a Prisoner's Dilemma where agents interact in a regular network and can learn about whether they play one-shot or repeatedly by incurring a cost of deliberation. With respect to other behavioral rules used in the literature, (i) we confirm the existence of a threshold value of the probability of repeated interaction, switching the emergent behavior from intuitive defector to dual-process cooperator; (ii) we find a different role of the node degree, with smaller degrees reducing the evolutionary success of dual-process cooperators; (iii) we observe a higher frequency of deliberation.
翻译:我们研究强化学习下囚徒困境中行为的演化,其中代理在规则网络上进行交互,并可以通过承担思考成本来了解他们是进行一次性博弈还是重复博弈。与文献中使用的其他行为规则相比,(i) 我们确认了重复交互概率存在一个阈值,该阈值将涌现行为从直觉背叛者转变为双过程合作者;(ii) 我们发现节点度数具有不同的作用,较小的度数降低了双过程合作者的演化成功度;(iii) 我们观察到思考的频率更高。