Paper A defines a time-consistent actuarial runtime that prices each side-effect-bearing action against a contractually fixed safe default and gates execution against a reserve budget. It treats the operator as passive. This paper makes the operator strategic. We characterise a five-attack space for autonomous AI-agent insurance contracts and prove when the actuarial runtime is gaming-resistant. Two attack surfaces -- post-toll safe-default selection and within-boundary action splitting -- are closed by Paper A's minimal-authority and no-splitting clauses. The remaining three require new contract clauses. First, common-control aggregation prevents cross-boundary re-routing from reducing toll below the boundary potential applied to total exposure. Second, interface failures such as invalid JSON are contract-relevant events, not safety wins: treating them as zero-toll safe defaults can reward unreliable models, while escalation fees reverse the incentive. We validate this interface-compliance theorem on committed cross-model traces from the companion empirical paper. Third, a model-identity menu with a componentwise-minimum penalty schedule makes truthful reporting of the deployed model weakly dominant. We then compose these clauses with Paper A's runtime guarantees to obtain joint incentive compatibility over the five-attack space. Finally, a two-parameter premium family discharges operator individual rationality and weak budget balance at the truthful equilibrium. The result is an incentive-compatibility layer for actuarial control of autonomous-agent side effects.
翻译:论文A定义了一个时间一致的精算运行时系统,该系统根据合同约定的安全默认值为每个产生副作用的行动定价,并依据储备预算限制执行。它将操作者视为被动的。本文使操作者具有策略性。我们刻画了自主AI代理保险合同的五维攻击空间,并证明了精算运行时系统何时具有抗博弈性。两种攻击面——收费后的安全默认值选择和边界内动作拆分——已通过论文A的最小权限条款和禁止拆分条款得以封闭。其余三种需要新的合同条款。首先,公共控制聚合防止跨边界路由将收费降至低于针对总风险敞口应用的边界势能。其次,接口故障(如无效JSON)属于合同相关事件,而非安全成功:将其视为零收费的安全默认值可能奖励不可靠模型,而升级费则逆转了这一激励。我们根据配套实证论文的跨模型提交轨迹验证了这一接口合规定理。第三,一个包含逐分量最小惩罚表的模型身份菜单,使部署模型的真实报告成为弱占优策略。随后,我们将这些条款与论文A的运行时保证相结合,以实现五维攻击空间上的联合激励相容性。最后,一个双参数保费族在真实均衡下满足了操作者的个体理性约束与弱预算平衡条件。结果是针对自主代理副作用精算控制的一个激励相容性层。