提升式训练与反演方法的统一框架 (A Unified Framework for Lifted Training and Inversion Approaches)

The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly but penalised with penalty terms. This chapter introduces a unified framework that encapsulates various lifted training strategies, including the Method of Auxiliary Coordinates, Fenchel Lifted Networks, and Lifted Bregman Training, and demonstrates how diverse architectures, such as Multi-Layer Perceptrons, Residual Neural Networks, and Proximal Neural Networks fit within this structure. By leveraging tools from convex optimisation, particularly Bregman distances, the framework facilitates distributed optimisation, accommodates non-differentiable proximal activations, and can improve the conditioning of the training landscape. We discuss the implementation of these methods using block-coordinate descent strategies, including deterministic implementations enhanced by accelerated and adaptive optimisation techniques, as well as implicit stochastic gradient methods. Furthermore, we explore the application of this framework to inverse problems, detailing methodologies for both the training of specialised networks (e.g., unrolled architectures) and the stable inversion of pre-trained networks. Numerical results on standard imaging tasks validate the effectiveness and stability of the lifted Bregman approach compared to conventional training, particularly for architectures employing proximal activations.

翻译：深度神经网络的训练主要依赖于基于梯度的优化与反向传播相结合来计算梯度。尽管这种方法取得了巨大成功，但仍面临梯度消失或爆炸、非光滑激活函数处理困难以及固有的顺序结构限制并行化等挑战。提升式训练方法通过将嵌套优化问题重新表述为更高维度的约束优化问题提供了一种替代方案，其中约束不再直接强制执行，而是通过惩罚项进行约束。本章介绍了一个统一框架，该框架囊括了多种提升式训练策略，包括辅助坐标法、Fenchel提升网络和提升Bregman训练，并展示了多层感知机、残差神经网络和近端神经网络等多种架构如何融入这一框架。通过利用凸优化工具（特别是Bregman距离），该框架促进了分布式优化，适应了不可微的近端激活函数，并能改善训练环境的条件数。我们讨论了使用块坐标下降策略实现这些方法，包括通过加速和自适应优化技术增强的确定性实现，以及隐式随机梯度方法。此外，我们探讨了该框架在反问题中的应用，详细介绍了专用网络（如展开架构）的训练方法和预训练网络的稳定反演方法。在标准成像任务上的数值结果表明，相较于传统训练方法，提升Bregman方法（特别是对于采用近端激活函数的架构）具有更高的有效性和稳定性。

相关内容

Neural Networks

关注 1652

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日