In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
翻译:本文旨在刻画可实现回归在PAC学习与在线学习两种框架下的统计复杂度。此前的理论研究已证明脂肪破碎维数的有限性足以保证PAC可学习性,而缩放后的Natarajan维数有限性则为必要条件;然而自Simon 1997(SICOMP '97)的工作以来,关于更完整刻画的进展甚微。为此,我们首先提出一种适用于可实现回归的最小最大实例最优学习器,并引入一个能够从定性与定量双重视角刻画实值预测器类可学习性的新颖维数。随后我们识别出一种与图维数相关的组合维数,该维数可刻画可实现设定下经验风险最小化的可学习性。最后,我们基于与DS维数相关的组合维数确立了可学习性的必要条件,并推测该条件在此情境下可能也是充分的。此外,在线学习方面,我们提出一个维数概念,该维数可在常数因子误差范围内刻画最小最大实例最优累积损失,并设计了一种适用于可实现回归的最优在线学习器,从而解决了Daskalakis与Golowich在STOC '22中提出的开放性问题。