Cooperative behavior is prevalent in both human society and nature. Understanding the emergence and maintenance of cooperation among self-interested individuals remains a significant challenge in evolutionary biology and social sciences. Reinforcement learning (RL) provides a suitable framework for studying evolutionary game theory as it can adapt to environmental changes and maximize expected benefits. In this study, we employ the State-Action-Reward-State-Action (SARSA) algorithm as the decision-making mechanism for individuals in evolutionary game theory. Initially, we apply SARSA to imitation learning, where agents select neighbors to imitate based on rewards. This approach allows us to observe behavioral changes in agents without independent decision-making abilities. Subsequently, SARSA is utilized for primary agents to independently choose cooperation or betrayal with their neighbors. We evaluate the impact of SARSA on cooperation rates by analyzing variations in rewards and the distribution of cooperators and defectors within the network.
翻译:合作行为在人类社会与自然界中普遍存在。理解自利个体间合作行为的涌现与维持,仍然是进化生物学和社会科学领域的重大挑战。强化学习为研究进化博弈论提供了一个合适的框架,因其能够适应环境变化并最大化期望收益。在本研究中,我们采用状态-行动-奖励-状态-行动算法作为进化博弈论中个体的决策机制。首先,我们将SARSA算法应用于模仿学习,智能体根据奖励选择模仿的邻居。这种方法使我们能够观察不具备独立决策能力的智能体的行为变化。随后,SARSA被用于主要智能体,使其能够独立选择与邻居合作或背叛。通过分析网络中奖励的变化以及合作者与背叛者的分布,我们评估了SARSA算法对合作率的影响。