The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

from arxiv, A preliminary version of part of this work was published at ICML 2020 with the title "Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks"

Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.

翻译：由于深度神经网络（DNN）模型的非凸训练特性，其有效性高度依赖非凸优化启发式方法。传统DNN训练方法通常依赖成本高昂的经验方法生成成功模型，且缺乏清晰的理论基础。本研究探讨利用凸优化理论与稀疏恢复模型优化神经网络训练过程，并对其最优权重提供更深入的理论解释。我们聚焦于训练具有分段线性激活函数的两层神经网络，证明此类网络可被转化为有限维凸规划问题。这些规划包含促进稀疏性的正则化项，构成了群组Lasso的变体。我们首先利用半无限规划理论证明有限宽度神经网络的强对偶性，随后将这些架构等价表示为高维凸稀疏恢复模型。值得注意的是，当数据矩阵秩有界时（这在卷积网络中常见），该凸规划的最坏情况复杂度在样本数与神经元数上呈多项式级。为将方法扩展至任意秩的训练数据，我们基于带状多面体子采样提出具有保证近似比的多项式时间近似方案。我们进一步证明：非凸训练目标的所有驻点均可表征为子采样凸规划的全局最优解。与需要启发式方法或广泛超参数调优的非凸方法不同，我们的凸模型可通过标准凸优化求解器直接训练。大量数值实验表明，凸模型不仅优于传统非凸方法，而且对优化器超参数不敏感。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日