In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
翻译:本文旨在刻画可实现回归在PAC学习与在线学习两种设置下的统计复杂度。先前工作已证明fat shattering维度的有限性足以保证PAC可学习性,且缩放Natarajan维度的有限性是必要性条件,但自Simon(SICOMP '97)的工作以来,关于更完整表征的进展甚微。为此,我们首先引入了一个针对可实现回归的极小极大实例最优学习器,并提出一种新颖的维度,该维度从定性与定量两个层面刻画了哪些类别的实值预测器是可学习的。接着,我们识别出一个与Graph维度相关的组合维度,该维度刻画了可实现设置下ERM的可学习性。最后,我们基于一个与DS维度相关的组合维度建立可学习性的必要条件,并推测该条件在此情境下可能也是充分的。此外,在在线学习方面,我们提供了一个维度来刻画极小极大实例最优累积损失(至多相差常数因子),并设计了可实现回归的最优在线学习器,从而解决了Daskalakis与Golowich在STOC '22中提出的开放问题。