While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
翻译:尽管在随机学习环境下,经验风险最小化(ERM)足以实现接近最优的泛化误差,但在在线学习环境下这一结论尚未得到证实——针对一般概念类别的算法依赖于计算效率低下的预言机(如标准最优算法SOA)。本文针对在线二分类问题提出一种仅依赖ERM预言机调用的算法,并证明其在可实现情形下具有有限遗憾值,在不可知情形下具有次线性增长的遗憾值。我们将遗憾值界定为底层概念类别的Littlestone维数和阈维数的函数。对于非参数博弈,我们获得类似结果:此时ERM预言机可解释为最优反应预言机——用于寻找玩家针对其他玩家历史对局的最优反应。在该设定下,我们提供仅依赖最优反应预言机的学习算法,只要博弈具有有界脂肪阈维数,该算法就能在两人零和博弈中收敛到近似极小化博弈均衡,在多玩家一般和博弈中收敛到近似粗相关均衡。我们的算法同时适用于二值博弈和实值博弈,可视为验证双预言机与多预言机算法在大型博弈求解实践中广泛合理性的理论依据。