A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization

While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a $(k,q)$th order variation regularization ($(k,q)$-VR), which is defined as the $q$th-powered integral of the absolute $k$th order derivative of the parametric models to be trained; penalizing the $(k,q)$-VR is expected to yield a smoother function, which is expected to avoid overfitting. Particularly, $(k,q)$-VR encompasses the conventional (general-order) total variation with $q=1$. While the $(k,q)$-VR terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the $(k,q)$-VR without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the $(k,q)$-VR terms are more ``resilient'' than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).

翻译：尽管包括深度神经网络在内的高表达性参数模型在建模复杂概念方面具有优势，但训练此类高度非线性模型已知会带来显著的过拟合风险。为解决这一问题，本研究提出一种$(k,q)$阶变差正则化方法（$(k,q)$-VR），该方法定义为待训练参数模型的$k$阶导数绝对值的$q$次幂积分；对$(k,q)$-VR施加惩罚可得到更平滑的函数，从而有望避免过拟合。特别地，当$q=1$时，$(k,q)$-VR包含了传统的（任意阶）全变差正则化。由于将$(k,q)$-VR项应用于通用参数模型时因积分运算而存在计算困难，本研究提供了一种随机优化算法，该算法无需显式数值积分即可高效训练带有$(k,q)$-VR的通用模型。所提方法可应用于任意结构的深度神经网络训练，因其仅需通过简单的随机梯度下降算法和自动微分即可实现。我们的数值实验表明，与采用传统参数正则化的神经网络相比，使用$(k,q)$-VR项训练的神经网络具有更强的“鲁棒性”。所提算法还可扩展至物理信息约束的神经网络训练（PINNs）。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日