We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.
翻译:我们研究了带噪声的两玩家零和矩阵博弈中识别纯策略纳什均衡的样本复杂度。形式上,考虑一个随机模型:学习者可采样输入矩阵 $A\in[-1,1]^{n\times m}$ 的元素 $(i,j)$,并观测 $A_{i,j}+\eta$,其中 $\eta$ 为零均值1-次高斯噪声。学习者的目标是以高概率识别矩阵 $A$ 的纯策略纳什均衡(若存在),同时尽可能减少采样次数。Zhou等(2017)提出了一个依赖于实例的样本复杂度下界,该下界仅取决于纯策略纳什均衡所在行与列的矩阵元素。我们设计了一个近似最优算法,其样本复杂度在忽略对数因子的情况下匹配该下界。纯策略纳什均衡识别问题同时推广了随机多臂赌博机和对抗赌博机中的纯探索问题,而我们的结果在这两种设定下均与最优界匹配(忽略对数因子)。