We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case H\"older constant. We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average H\"older smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness. Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.
翻译:我们将Ashlagi等人(COLT 2021)提出的平均Lipschitz光滑性概念扩展至赫尔德光滑性,从而推广了这一定义。这种函数“有效光滑性”的度量依赖于底层分布,且可能远小于经典的“最坏情况赫尔德常数”。我们同时考虑了可实现与不可知(含噪)回归场景,基于平均赫尔德光滑性证明了风险的上界与下界;即使在平均Lipschitz光滑性的特例中,这些速率也优于此前已知的结果。此外,在可实现场景下,我们的下界在对数因子范围内是紧的,从而确立了极小极大速率。从算法角度看,由于我们的平均光滑性概念依赖于未知的底层分布,学习器无法获得函数类的显式表示,因此无法执行经验风险最小化(ERM)。尽管如此,我们提供了不同的学习算法,均能达到(近乎)最优的学习速率。我们的结果适用于任意全有界度量空间,并以其内蕴几何形式表述。总体而言,研究表明经典的赫尔德光滑性最坏情况概念本质上可由其平均概念取代,从而得出显著更优的保证。