We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the ``effective smoothness'' of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic ``worst-case'' H\"older constant. We prove nearly tight upper and lower risk bounds in terms of the average H\"older smoothness, establishing the minimax rate in the realizable regression setting up to log factors; this was not previously known even in the special case of average Lipschitz smoothness. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown sampling distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide a learning algorithm that achieves the (nearly) optimal learning rate. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.
翻译:我们将Ashlagi等人(COLT 2021)提出的平均Lipschitz光滑性概念推广至Hölder光滑性。这种函数“有效光滑性”的度量对底层分布敏感,且可能远小于其经典“最坏情形”Hölder常数。我们证明了关于平均Hölder光滑性的几乎紧致的风险上下界,建立了可实现回归设定下(对数因子内)的极小极大最优率——这一结果在平均Lipschitz光滑性的特例中此前也未知。从算法视角看,由于平均光滑性的定义依赖于未知的采样分布,学习器无法显式表示函数类,因此不能执行ERM。尽管如此,我们仍提供了一种学习算法,实现了(近乎)最优的学习率。我们的结果在任意全有界度量空间中成立,并以其内在几何性质表述。总体而言,我们的研究表明,经典最坏情形Hölder光滑性本质上可被其平均版本替代,从而获得显著更优的保证。