Governing Strategic Dynamics: Equilibrium Stabilization via Divergence-Driven Control

Black-box coevolution in mixed-motive games is often undermined by opponent-drift non-stationarity and noisy rollouts, which distort progress signals and can induce cycling, Red-Queen dynamics, and detachment. We propose the \emph{Marker Gene Method} (MGM), a curriculum-inspired governance mechanism that stabilizes selection by anchoring evaluation to cross-generational marker individuals, together with DWAM and conservative marker-update rules to reduce spurious updates. We also introduce NGD-Div, which adapts the key update threshold using a divergence proxy and natural-gradient optimization. We provide theoretical analysis in strictly competitive settings and evaluate MGM integrated with evolution strategies (MGM-E-NES) on coordination games and a resource-depletion Markov game. MGM-E-NES reliably recovers target coordination in Stag Hunt and Battle of the Sexes, achieving final cooperation probabilities close to $(1,1)$ (e.g., $0.991\pm0.01/1.00\pm0.00$ and $0.97\pm0.00/0.97\pm0.00$ for the two players). In the Markov resource game, it maintains high and stable state-conditioned cooperation across 30 seeds, with final cooperation of $\approx 0.954/0.980/0.916$ in \textsc{Rich}/\textsc{Poor}/\textsc{Collapsed} (both players; small standard deviations), indicating welfare-aligned and state-dependent behavior. Overall, MGM-E-NES transfers across tasks with minimal hyperparameter changes and yields consistently stable training dynamics, showing that top-level governance can substantially improve the robustness of black-box coevolution in dynamic environments.

翻译：混合动机博弈中的黑箱协同进化常受对手漂移非平稳性与噪声采样干扰，这些因素会扭曲进化信号，可能引发循环动态、红皇后效应及策略脱节。本文提出标记基因方法（MGM），这是一种受课程学习启发的调控机制，通过将评估锚定于跨代标记个体来稳定选择过程，并结合DWAM与保守标记更新规则以减少伪更新。我们同时提出NGD-Div方法，利用发散度代理与自然梯度优化自适应调整关键更新阈值。在严格竞争场景下提供理论分析，并在协调博弈与资源耗竭马尔可夫博弈中评估结合进化策略的MGM（MGM-E-NES）。实验表明，MGM-E-NES在猎鹿博弈与性别之战中能可靠恢复目标协调，最终合作概率接近$(1,1)$（例如两位玩家分别达到$0.991\pm0.01/1.00\pm0.00$与$0.97\pm0.00/0.97\pm0.00$）。在马尔可夫资源博弈中，该方法在30次随机种子下均保持高且稳定的状态条件合作，最终在\textsc{富足}/\textsc{匮乏}/\textsc{崩溃}状态下的合作率为$\approx 0.954/0.980/0.916$（双方玩家；标准差极小），表明其能产生与福利一致且状态依赖的行为。总体而言，MGM-E-NES仅需极少超参数调整即可跨任务迁移，并产生持续稳定的训练动态，证明顶层调控机制能显著提升动态环境中黑箱协同进化的鲁棒性。