Random Smoothing Regularization in Kernel Gradient Descent Learning

Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results.

翻译：随机平滑数据增强是一种独特的正则化形式，通过向输入数据引入噪声来防止过拟合，从而鼓励模型学习更泛化的特征。尽管该方法在各类应用中取得了成功，但关于随机平滑正则化能力的系统性研究仍然匮乏。本文旨在填补这一空白，提出一个能够自适应且有效学习经典Sobolev空间中多种真值函数的随机平滑正则化框架。具体而言，我们研究了两类底层函数空间：低本征维度的Sobolev空间（包含$D$维欧氏空间或低维子流形中的Sobolev空间作为特例），以及具有张量结构的混合光滑Sobolev空间。通过将随机平滑正则化作为一类新型卷积平滑核，我们采用核梯度下降算法（结合早停法或权重衰减）即可在这些情况下达到最优收敛率。值得注意的是，我们的估计量能够自适应底层数据的结构假设并避免维度灾难。这得益于对注入噪声分布（如高斯分布、拉普拉斯分布或一般多项式噪声）的多样化选择，从而广泛适配上述底层数据的结构假设。收敛率仅取决于有效维度，该维度可能远小于实际数据维度。我们在模拟数据上进行了数值实验以验证理论结果。