We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.
翻译:我们引入了一个基于自正则化概念的通用分析框架,用于分析学习算法。该框架能够捕捉隐式复杂度控制,而无需显式正则化。这一框架的提出源于先前观察到许多算法(例如基于梯度下降的学习)表现出隐式正则化现象。简而言之,对于自正则化算法,预测器的复杂度本质上受到达到相同经验风险的最简单比较器的复杂度所控制。该框架具有足够的普适性,既能涵盖经典的正则化经验风险最小化方法,也能涵盖梯度下降算法。基于自正则化概念,我们对此类算法进行了深入的统计分析,包括极小极大最优收敛速率。分析表明,只需证明算法是自正则化的即可——所有进一步的要求均源于学习问题本身。最后,我们讨论了数据依赖超参数选择问题,提供了一个通用结论,该结论能以双对数因子的精度实现极小极大最优收敛速率,并涵盖了基于再生核希尔伯特空间的梯度下降算法的数据驱动早停策略。