Spatial public goods games are characterized by high-dimensional state spaces and localized externalities, which pose significant challenges for achieving stable and widespread cooperation. Traditional approaches often struggle to effectively capture neighborhood-level strategic interactions and dynamically align individual incentives with collective welfare. To resolve this issue, this paper introduces a novel intelligent decision-making framework called Local Mean-Field Proximal Policy Optimization with Unbalanced Punishment (LMFPPO-UBP). The conventional mean field concept is reformulated as a socio-statistical sensor embedded directly into the policy gradient space of deep reinforcement learning, allowing agents to adapt their strategies based on mesoscale neighborhood dynamics. Additionally, an unbalanced punishment mechanism is integrated to penalize defectors proportionally to the local density of cooperators, thereby reshaping the payoff structures without imposing direct costs on cooperative agents. Experimental results demonstrate that the LMFPPO-UBP promotes rapid and stable global cooperation even under low enhancement factors, consistently outperforming baseline methods such as Q-learning and Fermi update rules. Statistical analyses further validate the framework's effectiveness in lowering the cooperation threshold and achieving better coordinated outcomes.
翻译:空间公共物品博弈具有高维状态空间和局部外部性的特征,这对实现稳定且广泛的合作构成了重大挑战。传统方法往往难以有效捕捉邻域层面的策略互动,并动态地将个体激励与集体福利相协调。为解决这一问题,本文提出了一种新颖的智能决策框架,称为带非均衡惩罚的局部平均场近端策略优化。传统的平均场概念被重新表述为一种社会统计传感器,直接嵌入深度强化学习的策略梯度空间中,使得智能体能够根据中观尺度的邻域动态调整其策略。此外,框架集成了一个非均衡惩罚机制,该机制根据合作者的局部密度按比例惩罚背叛者,从而在不给合作者施加直接成本的情况下重塑收益结构。实验结果表明,即使在较低的增强因子下,LMFPPO-UBP也能促进快速且稳定的全局合作,其性能持续优于Q学习和费米更新规则等基线方法。统计分析进一步验证了该框架在降低合作阈值和实现更优协调结果方面的有效性。