Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

In the first-order query model for zero-sum $K\times K$ matrix games, playersobserve the expected pay-offs for all their possible actions under therandomized action played by their opponent. This is a classical model,which has received renewed interest after the discoveryby Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computedefficiently from $O(\ln K / \epsilon) $ instead of $O( \ln K / \epsilon^2)$ queries.Surprisingly, the optimal number of such queries, as a function of both$\epsilon$ and $K$, is not known.We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linearin $K$, which means that it is essentially as hard as querying the wholematrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the currentquery complexity upper bound stands at $O(\min(\ln(K) / \epsilon , K))$. We argue that, unfortunately, obtaining matchinglower bound is not possible with existing techniques: we prove that nolower bound can be derived by constructing hard matrices whose entriestake values in a known countable set, because such matrices can be fullyidentified by a single query. This rules out, for instance, reducing toa submodular optimization problem over the hypercube by encoding itas a binary matrix. We then introduce a new technique for lower bounds,which allows us to obtain lower bounds of order$\tilde\Omega(\log(1 / (K\epsilon)))$ for any $\epsilon \leq1 / cK^4$, where $c$ is a constant independent of $K$. We furtherdiscuss possible future directions to improve on our techniques in orderto close the gap with the upper bounds.

翻译：在一阶查询模型下，针对零和$K\times K$矩阵博弈，玩家可以观察其所有可能动作在对手随机动作下的期望收益。这是一个经典模型，自Rakhlin和Sridharan发现可以通过$O(\ln K / \epsilon)$次查询（而非$O(\ln K / \epsilon^2)$次）高效计算$\epsilon$-近似纳什均衡后，该模型重新引起了关注。令人惊讶的是，这类查询的最优数量（作为$\epsilon$和$K$的函数）尚不明确。我们在此问题上取得了两方面进展。首先，我们完整刻画了学习精确均衡（$\epsilon=0$）的查询复杂度，证明其需要与$K$呈线性关系的查询次数，这意味着其难度本质上等价于查询整个矩阵（同样需要$K$次查询）。其次，对于$\epsilon > 0$，当前查询复杂度上界为$O(\min(\ln(K) / \epsilon , K))$。然而我们认为，遗憾的是，利用现有技术无法获得匹配的下界：我们证明，无法通过构造条目值取自已知可数集合的困难矩阵来推导下界，因为此类矩阵可通过单次查询完全识别。这排除了例如将问题编码为二进制矩阵从而归约为超立方体上的子模优化问题的可能性。随后，我们引入一种新的下界技术，对于任意$\epsilon \leq 1 / cK^4$（其中$c$为与$K$无关的常数），可得到$\tilde\Omega(\log(1 / (K\epsilon)))$阶的下界。我们进一步讨论了未来可改进现有技术以缩小与上界差距的可能方向。