Regulatory institutions (from content moderation platforms to financial supervisors) observe, deliberate, and intervene only after a characteristic delay. We ask whether this processing lag alone can destabilize a multi-agent system that would otherwise remain stable, without exogenous shocks, coordination among agents, or malicious actors. We study this in two stages. First, we analyze a delayed replicator equation in which autonomous agents benefit from radical behavior but face punishment based on a lagged institutional alarm signal. We derive a closed-form critical delay beyond which the unique interior equilibrium loses stability through a Hopf bifurcation, and prove via center manifold reduction that the bifurcation is supercritical (bounded oscillations, not explosive growth) for the entire sigmoid response family. Second, we embed N=240 agents on a network with reinforcement learning (tabular Q-learning) and cross institutional delay with three decision architectures: fixed-policy, reactive (a memoryless threshold heuristic), and Q-learning. The hierarchy is opposite to the naive expectation that learning amplifies instability. Reactive agents are perfectly stable without delay yet collapse once delay is introduced (96% runaway by delay >= 8); fixed-policy agents are immune (0% at all delays); Q-learning agents are only partially resilient (66% at delay 20). The destabilizing ingredient is reactivity to delayed signals, not learning: agents that immediately exploit low-alarm windows trigger oscillatory feedback loops, while learning buffers this through punishment memory encoded in value functions. Throughout, "runaway" denotes bounded large-amplitude oscillation crossing a radical-fraction threshold, consistent with the supercritical bifurcation, not unbounded growth.
翻译:监管机构(从内容审核平台到金融监管机构)在观察、审议和干预之前均存在特征性延迟。我们探究这种处理延迟本身是否会破坏多智能体系统的稳定性——该体系在没有外源性冲击、智能体间协调或恶意行为者介入时本可维持稳定。本研究分两个阶段展开:首先,我们分析延迟复制子方程,其中自主智能体从激进行为获利,但会依据滞后的机构警报信号遭受惩罚。我们推导出临界延迟的闭式表达式——当超过该阈值时,唯一内部均衡点通过霍普夫分岔丧失稳定性,并利用中心流形降维法证明,对于整个S形响应函数族而言,该分岔均为超临界(有界振荡而非爆炸性增长)。其次,我们将N=240个智能体嵌入具有强化学习(表格型Q学习)的网络中,并交叉考察机构延迟与三种决策架构:固定策略、反应式(无记忆阈值启发式)和Q学习。层级排序与"学习放大不稳定性"的直觉预期相反:无延迟时反应式智能体完全稳定,但引入延迟后系统崩溃(延迟≥8时96%出现失控);固定策略智能体始终免疫(所有延迟情况下失控率为0%);Q学习智能体仅具备部分韧性(延迟20时失控率为66%)。导致不稳定的关键因素是面对延迟信号时的反应性行为,而非学习机制本身:立即利用低警报窗口的智能体会触发振荡反馈回路,而学习过程通过价值函数编码的惩罚记忆缓冲这种效应。全文所指"失控"均表示跨越激进分数阈值的有限大幅振荡(符合超临界分岔特征),而非无界增长。