Researchers have explored the performance of Iterated Prisoner's Dilemma strategies for decades, from the celebrated performance of Tit for Tat to the introduction of the zero-determinant strategies and the use of sophisticated learning structures such as neural networks. Many new strategies have been introduced and tested in a variety of tournaments and population dynamics. Typical results in the literature, however, rely on performance against a small number of somewhat arbitrarily selected strategies in a small number of tournaments, casting doubt on the generalizability of conclusions. In this work, we analyze a large collection of 195 strategies in thousands of computer tournaments, present the top performing strategies across multiple tournament types, and distill their salient features. The results show that there is not yet a single strategy that performs well in diverse Iterated Prisoner's Dilemma scenarios, nevertheless there are several properties that heavily influence the best performing strategies. This refines the properties described by Axelrod in light of recent and more diverse opponent populations to: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. More precisely, we find that strategies perform best when their probability of cooperation matches the total tournament population's aggregate cooperation probabilities. The features of high performing strategies help cast some light on why strategies such as Tit For Tat performed historically well in tournaments and why zero-determinant strategies typically do not fare well in tournament settings. Furthermore, our findings have implications for the future training of autonomous agents, as understanding the crucial features for incorporation into these agents becomes essential.
翻译:数十年来,研究者们一直探索迭代囚徒困境策略的性能,从著名的以牙还牙策略到零行列式策略的引入,再到神经网络等复杂学习结构的应用。许多新策略已在各种竞赛和群体动力学中得到测试和验证。然而,文献中的典型结果往往依赖于在少数竞赛中与少量任意选择策略的对抗表现,这使结论的普适性受到质疑。本研究分析了包含195种策略的大型策略集,在数千场计算机竞赛中评估其表现,揭示了在多种竞赛类型中表现最优的策略,并提炼出关键特征。结果表明,目前尚无单一策略能在多样化的迭代囚徒困境场景中始终保持优异表现,但存在若干显著影响顶尖策略性能的共同属性。这些发现将Axelrod描述的策略特征更新为:友好、可激怒且宽容、适度嫉妒、聪明机智、适应环境,以应对近年更复杂多样的对手群体。更精确而言,我们发现策略的最优性能出现在其合作概率与整个竞赛群体的总体合作概率相匹配时。这些高性能策略的特征有助于解释为何以牙还牙等策略在历史竞赛中表现卓越,以及为何零行列式策略在竞赛环境中通常表现不佳。此外,我们的发现对未来自主智能体的训练具有启示意义——理解这些关键特征并将其融入智能体设计将成为重要课题。