Three ways that non-differentiability affects neural network training

This paper investigates how non-differentiability affects three different aspects of the neural network training process. We first analyze fully connected neural networks with ReLU activations, for which we show that the continuously differentiable neural networks converge faster than non-differentiable neural networks. Next, we analyze the problem of $L_{1}$ regularization and show that the solutions produced by deep learning solvers are incorrect and counter-intuitive even for the $L_{1}$ penalized linear model. Finally, we analyze the Edge of Stability problem, where we show that all convex, non-smooth, Lipschitz continuous functions display unstable convergence, and provide an example of a result derived using twice differentiable functions which fails in the once differentiable setting. More generally, our results suggest that accounting for the non-linearity of neural networks in the training process is essential for us to develop better algorithms, and to get a better understanding of the training process in general.

翻译：本文研究了不可微性如何影响神经网络训练过程的三个不同方面。首先，我们分析了使用ReLU激活函数的全连接神经网络，结果表明连续可微的神经网络比不可微的神经网络收敛更快。其次，我们分析了$L_{1}$正则化问题，并证明深度学习求解器产生的解即使对于$L_{1}$惩罚线性模型也是错误且反直觉的。最后，我们分析了"稳定性边缘"问题，结果表明所有凸、非光滑、Lipschitz连续函数均表现出不稳定的收敛性，并提供了一个利用二次可微函数推导但在一阶可微场景中失效的结果实例。更广泛而言，我们的结果提示，在训练过程中考虑神经网络的非线性特性对于开发更优算法以及整体上更深入理解训练过程至关重要。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日