Evolutionary game theory provides a mathematical foundation for cross-disciplinary fertilization, especially for integrating ideas from artificial intelligence and game theory. Such integration offers a transparent and rigorous approach to complex decision-making problems in a variety of important contexts, ranging from evolutionary computation to machine behavior. Despite the astronomically huge individual behavioral strategy space for interactions in the iterated Prisoner's Dilemma (IPD) games, the so-called Zero-Determinant (ZD) strategies is a set of rather simple memory-one strategies yet can unilaterally set a linear payoff relationship between themselves and their opponent. Although the witting of ZD strategies gives players an upper hand in the IPD games, we find and characterize unbending strategies that can force ZD players to be fair in their own interest. Moreover, our analysis reveals the ubiquity of unbending properties in common IPD strategies which are previously overlooked. In this work, we demonstrate the important steering role of unbending strategies in fostering fairness and cooperation in pairwise interactions. Our results will help bring a new perspective by means of combining game theory and multi-agent learning systems for optimizing winning strategies that are robust to noises, errors, and deceptions in non-zero-sum games.
翻译:演化博弈论为跨学科融合提供了数学基础,尤其促进了人工智能与博弈论思想的整合。这种整合为从演化计算到机器行为等众多重要领域中的复杂决策问题提供了透明且严谨的解决方案。尽管迭代囚徒困境(IPD)博弈中个体行为策略空间极为庞大,但所谓的零行列式(ZD)策略作为一类较为简单的记忆-策略,能够单方面设定自身与对手之间的线性收益关系。虽然ZD策略的运用使参与者在IPD博弈中占据优势,但我们发现并刻画了"不屈策略"(unbending strategies),这类策略能迫使ZD参与者出于自身利益而保持公平。此外,我们的分析揭示了常见IPD策略中普遍存在的"不屈"特性,这些特性此前未被充分认识。本研究证明了"不屈策略"在促进成对交互中的公平与合作方面具有重要的转向引导作用。我们的研究结果将有助于通过博弈论与多智能体学习系统的结合,为优化非零和博弈中对噪声、错误和欺骗具有鲁棒性的致胜策略提供新视角。