Selection as Power argued that upstream selection authority, rather than internal objective misalignment, constitutes a primary source of risk in high-stakes agentic systems. However, the original framework was static: governance constraints bounded selection power but did not adapt over time. In this work, we extend the framework to dynamic settings by introducing incentivized selection governance, where reinforcement updates are applied to scoring and reducer parameters under externally enforced sovereignty constraints. We formalize selection as a constrained reinforcement process in which parameter updates are projected onto governance-defined feasible sets, preventing concentration beyond prescribed bounds. Across multiple regulated financial scenarios, unconstrained reinforcement consistently collapses into deterministic dominance under repeated feedback, especially at higher learning rates. In contrast, incentivized governance enables adaptive improvement while maintaining bounded selection concentration. Projection-based constraints transform reinforcement from irreversible lock-in into controlled adaptation, with governance debt quantifying the tension between optimization pressure and authority bounds. These results demonstrate that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step, offering a principled approach to integrating reinforcement into high-stakes agentic systems without surrendering bounded selection authority.
翻译:《选择即权力》指出,上游选择权威而非内部目标错配构成了高风险智能体系统中的主要风险来源。然而,原始框架是静态的:治理约束限制了选择权力但未随时间动态调整。本研究通过引入激励式选择治理机制,将框架扩展至动态场景,其中在外部强制的主权约束下对评分函数与约简器参数实施强化更新。我们将选择形式化为约束强化过程,将参数更新投影至治理定义的可行集,从而防止选择集中度超越预设边界。在多个受监管金融场景中,无约束强化在重复反馈下(尤其在高学习率时)持续坍缩为确定性主导模式。相比之下,激励式治理机制在保持有限选择集中度的同时实现了自适应改进。基于投影的约束将强化过程从不可逆的路径锁定转化为受控适应过程,其中治理债务量化了优化压力与权威边界之间的张力。这些结果表明,当每个更新步骤均强制执行主权约束时,学习动力学可与结构多样性共存,为在高风险智能体系统中集成强化学习而不放弃有限选择权威提供了原则性框架。