Body-Reservoir Governance in Repeated Games: Embodied Decision-Making, Dynamic Sentinel Adaptation, and Complexity-Regularized Optimization

Standard game theory explains cooperation in repeated games through conditional strategies such as Tit-for-Tat (TfT), but these require continuous computation that imposes physical costs on embodied agents. We propose a three-layer Body-Reservoir Governance (BRG) architecture: (1) a body reservoir (echo state network) whose $d$-dimensional state performs implicit inference over interaction history, serving as both decision-maker and anomaly detector, (2) a cognitive filter providing costly strategic tools activated on demand, and (3) a metacognitive governance layer with receptivity parameter $α\in [0,1]$. At full body governance ($α=1$), closed-loop dynamics satisfy a self-consistency equation: cooperation is expressed as the reservoir's fixed point, not computed. Strategy complexity cost is defined as the KL divergence between the reservoir's state distribution and its habituated baseline. Body governance reduces this cost, with action variance decreasing up to $1600\times$ with dimension $d$. A dynamic sentinel generates a composite discomfort signal from the reservoir's own state, driving adaptive $α(t)$: near baseline during cooperation, rapidly dropping upon defection to activate cognitive retaliation. Overriding the body incurs thermodynamic cost proportional to internal state distortion. The sentinel achieves the highest payoff across all conditions, outperforming static body governance, TfT, and EMA baselines. A dimension sweep ($d \in \{5,\ldots,100\}$) shows implicit inference scales with bodily richness ($23\times$ to $1600\times$ variance reduction), attributable to reservoir dynamics. A phase diagram in $(d, τ_{\mathrm{env}})$ space reveals governance regime transitions near $d \approx 20$. The framework reinterprets cooperation as the minimum-dissipation response of an adapted dynamical system -- emergent from embodied dynamics rather than computed.

翻译：标准博弈论通过以牙还牙（TfT）等条件策略解释重复博弈中的合作行为，但这些策略需要持续计算，对具身智能体产生物理成本。我们提出一种三层身体-储层治理（BRG）架构：（1）身体储层（回声状态网络），其$d$维状态对交互历史执行隐式推断，兼具决策器与异常检测器功能；（2）认知过滤器提供按需激活的高成本策略工具；（3）具有可调参数$α\in [0,1]$的元认知治理层。在完全身体治理（$α=1$）下，闭环动力学满足自洽方程：合作表现为储层不动点，无需计算。策略复杂度成本定义为储层状态分布与其习惯基线的KL散度。身体治理可降低该成本，当维度为$d$时动作方差最多降低$1600\times$。动态哨兵通过储层自身状态生成复合不适信号，驱动自适应$α(t)$：合作期间接近基线，遭遇背叛时迅速下降以激活认知报复。覆盖身体治理会产生与内部状态畸变成正比的热力学成本。该哨兵在所有条件下均获得最高收益，优于静态身体治理、TfT及EMA基线。维度扫描（$d \in \{5,\ldots,100\}$）显示隐式推断能力随身体丰富度提升（方差降低$23\times$至$1600\times$），这归因于储层动力学特性。$(d, τ_{\mathrm{env}})$参数空间的相图揭示了在$d \approx 20$附近发生的治理机制转变。该框架将合作重新诠释为适应动力系统的耗散最小化响应——从具身动力学中涌现而非通过计算获得。