Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $\epsilon$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/\epsilon)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/\epsilon^2$ rounds. In this paper, we take a step toward bridging the gap between those two results. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators. We develop parameterized algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/\epsilon^2$ rounds. This closes the gap in the sense that $k=1$ recovers linear swap regret, while $k=N$ recovers swap regret. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/\epsilon^2$ rounds, where $d$ is the depth of the game, assuming constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree.
翻译:近期,Dagan、Daskalakis、Fishelson 和 Golowich [2023] 以及 Peng 和 Rubinstein [2023] 的突破性成果建立了一个高效算法,能够在 $N^{\tilde O(1/\epsilon)}$ 轮内实现扩展形式策略空间(维度为 $N$)上最多 $\epsilon$ 的交换遗憾。另一方面,Farina 和 Pipis [2023] 开发了一种高效算法,可在 $\mathsf{poly}(N)/\epsilon^2$ 轮内最小化较弱的线性交换遗憾概念。本文旨在弥合这两项结果之间的差距。我们引入了 $k$-中介偏差集,该集合将 Zhang、Farina 和 Sandholm [2024] 近期提出的非定时通信偏差推广到多个中介的情形。我们开发了参数化算法,可在 $N^{O(k)}/\epsilon^2$ 轮内最小化关于该偏差集的遗憾。这填补了差距,因为 $k=1$ 恢复线性交换遗憾,而 $k=N$ 恢复交换遗憾。此外,通过将 $k$-中介偏差与低度多项式相关联,我们证明,在假设恒定分支因子的情况下,针对度-$k$ 多项式交换偏差的遗憾最小化可在 $N^{O(kd)^3}/\epsilon^2$ 轮内实现,其中 $d$ 是博弈的深度。对于固定度 $k$,这在贝叶斯博弈中为多项式时间,在 $d = \mathsf{polylog} N$(博弈树的常见平衡性假设)时更广泛地为拟多项式时间。