In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size $η$ in $n\times m$ two-player zero-sum games played over $T$ rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order $Ω(N/η)$, proving that exploitation scales to the number of the learner's suboptimal actions $N$ and vanishes in their absence. Second, for alternating optimizer, a surplus of $Ω(ηT/\mathrm{poly}(n,m))$ can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.
翻译:本文研究了在$n\times m$双人零和博弈中,使用固定步长$\eta$的跟随正则化领导者(FTRL)学习者在$T$轮博弈中面对全知优化器时的可开发性。与先前分析不同,我们证明可开发性是FTRL家族的固有特征,而非特定实例化的人为产物。首先,对于固定优化器,我们建立了阶为$\Omega(N/\eta)$的普适定律,证明开发规模与学习者次优动作数量$N$成正比,并在无次优动作时消失。其次,对于交替优化器,无论博弈均衡结构如何,在随机博弈中都能以高概率保证$\Omega(\eta T/\mathrm{poly}(n,m))$的盈余。我们的分析再次揭示了清晰的几何二分性:非陡峭正则化器允许优化器通过有限时间内消除次优动作获取最大盈余,而陡峭正则化器引入的修正项会随时间衰减,可能延迟开发。最后,我们讨论了这种杠杆作用在双边收益不确定性下是否持续存在,并提出了脆弱性度量指标以量化哪些正则化器最易受策略操纵。