While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
翻译:尽管经验风险最小化(ERM)足以在随机学习设定中达到近乎最优的泛化误差,但在在线学习设定中,这一结论尚未被证实。在线学习中,针对一般概念类的算法依赖于计算效率低下的预言机(如标准最优算法SOA)。本文提出了一种仅依赖ERM预言机调用的在线二分类设定算法,并证明其在可实现设定中具有有限遗憾,在不可知设定中遗憾次线性增长。我们将遗憾表示为底层概念类的Littlestone维数和阈值维数的函数。对于非参数博弈,我们得到了类似结果——此时ERM预言机可解释为最优反应预言机,用于找出某玩家针对其他玩家历史对局的最优反应。在该设定下,我们提供了仅依赖最优反应预言机的学习算法,只要博弈具有有界胖阈值维数,这些算法即可在两人零和博弈中收敛到近似极小极大均衡,在多人一般和博弈中收敛到近似粗糙相关均衡。我们的算法适用于二值博弈和实值博弈,并可视为对实践中大规模博弈求解中广泛使用的双预言机算法和多预言机算法的理论支撑。