Random Smoothing Regularization in Kernel Gradient Descent Learning

Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results.

翻译：随机平滑数据增强是一种独特的正则化形式，通过向输入数据引入噪声来防止过拟合，从而促使模型学习更泛化的特征。尽管该方法在多种应用中取得了成功，但关于随机平滑正则化能力的系统研究仍存在空白。本文旨在填补这一空白，提出一种能够自适应且高效学习经典Sobolev空间中各类真实函数的随机平滑正则化框架。具体而言，我们研究了两种底层函数空间：低本征维度的Sobolev空间（包含D维欧几里得空间中Sobolev空间或低维子流形作为特例）以及具有张量结构的混合光滑Sobolev空间。通过将随机平滑正则化作为基于卷积的新型平滑核，我们采用核梯度下降算法（结合早停或权重衰减）即可在这些情形下达到最优收敛速率。值得关注的是，我们的估计器能够自适应底层数据的结构假设并避免维数灾难。这一特性通过注入噪声分布的多样化选择（如高斯、拉普拉斯或一般多项式噪声）得以实现，从而广泛适应上述底层数据的结构假设。收敛速率仅取决于有效维度，该维度可能显著小于实际数据维度。我们基于模拟数据开展数值实验以验证理论结果。