In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
翻译:本工作旨在刻画可实现回归在PAC学习框架与在线学习框架下的统计复杂度。先前研究已证明脂肪破碎维度的有限性对PAC可学习性是充分的,而缩放Natarajan维度的有限性是其必要条件,但自Simon(SIOCOMP '97)的工作以来,对此问题的更完整刻画进展甚微。为此,我们首先为可实现回归引入一个极小极大实例最优学习器,并提出一种新维度,该维度在定性与定量层面共同刻画了哪些实值预测器类是可学习的。随后,我们识别出与图维度相关的组合维度,该维度刻画了可实现场景中经验风险最小化学习器的可学习性。最后,我们基于与DS维度相关的组合维度建立了可学习性的必要条件,并推测该条件在此背景下也可能是充分的。此外,在在线学习背景下,我们提出了一个在常数因子内刻画极小极大实例最优累积损失的维度,并为可实现回归设计了最优在线学习器,从而解决了Daskalakis与Golowich在STOC '22中提出的开放性问题。