Test-then-Punish: A Statistical Approach to Repeated Games

We study discounted infinitely repeated games in which players agree on a cooperative mixed action profile but, at each step, observe only the realized pure actions. This form of imperfect monitoring breaks classical trigger strategies, since deviations cannot be identified with certainty. To address this problem, we study how hypothesis testing can be used to sustain cooperation. First, we develop a framework that embeds statistical inference directly into strategic behavior. We introduce relaxed equilibrium notions that allow players to ignore vanishing probability histories arising from rare but extreme realizations of the monitoring process. Within this framework, we formalize a generic test then punish strategy: players commit ex ante to a cooperative mixed action profile, continuously test whether observed play is consistent with this prescription, and permanently switch to punishment once sufficient statistical evidence of deviation accumulates. Under mild conditions on the testing procedure, this construction sustains any feasible and individually rational payoff for sufficiently patient players, yielding a Folk theorem type result under imperfect monitoring. We then propose two explicit implementations of this strategy. The first relies on anytime valid sequential tests, providing uniform control of Type I error over an infinite horizon and a finite expected detection time for payoff-relevant deviations. However, this strategy only accounts for stationary deviations and yields a Nash equilibrium. The second uses testing over batches with a fixed size, accommodating arbitrary deviations and achieving subgame perfect Nash equilibrium, at the cost of losing global anytime guarantees on false punishments.

翻译：我们研究折扣无限重复博弈，其中参与者同意采用合作性混合行动策略，但在每一步仅能观测到实际实现的纯行动。这种不完全监测形式破坏了经典的触发策略，因为偏离行为无法被确定性地识别。为解决此问题，我们研究了如何利用假设检验来维持合作。首先，我们开发了一个将统计推断直接嵌入战略行为的框架。我们引入了松弛均衡概念，允许参与者忽略由监测过程中罕见但极端实现所产生的概率趋于零的历史轨迹。在此框架内，我们形式化了一种通用的"测试后惩罚"策略：参与者事前承诺采用合作性混合行动策略，持续检验观测到的博弈行为是否与该策略一致，并在积累足够的偏离统计证据后永久转向惩罚阶段。在检验程序满足温和条件下，该构造能够为足够有耐心的参与者维持任何可行且个体理性的收益，从而在不完全监测下获得类似"民间定理"的结果。随后，我们提出了该策略的两种具体实现方案。第一种方案基于任意时间有效的序贯检验，可在无限时间范围内实现对第一类错误的均匀控制，并对收益相关偏离具有有限期望检测时间。然而，该策略仅考虑静态偏离且仅能达成纳什均衡。第二种方案采用固定批量的批次检验方法，能够适应任意类型的偏离并实现子博弈完美纳什均衡，但代价是失去了关于错误惩罚的全局任意时间保证。