The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly but penalised with penalty terms. This chapter introduces a unified framework that encapsulates various lifted training strategies, including the Method of Auxiliary Coordinates, Fenchel Lifted Networks, and Lifted Bregman Training, and demonstrates how diverse architectures, such as Multi-Layer Perceptrons, Residual Neural Networks, and Proximal Neural Networks fit within this structure. By leveraging tools from convex optimisation, particularly Bregman distances, the framework facilitates distributed optimisation, accommodates non-differentiable proximal activations, and can improve the conditioning of the training landscape. We discuss the implementation of these methods using block-coordinate descent strategies, including deterministic implementations enhanced by accelerated and adaptive optimisation techniques, as well as implicit stochastic gradient methods. Furthermore, we explore the application of this framework to inverse problems, detailing methodologies for both the training of specialised networks (e.g., unrolled architectures) and the stable inversion of pre-trained networks. Numerical results on standard imaging tasks validate the effectiveness and stability of the lifted Bregman approach compared to conventional training, particularly for architectures employing proximal activations.
翻译:深度神经网络的训练主要依赖于基于梯度的优化与反向传播相结合来计算梯度。尽管这种方法取得了巨大成功,但仍面临梯度消失或爆炸、非光滑激活函数处理困难以及固有的顺序结构限制并行化等挑战。提升式训练方法通过将嵌套优化问题重新表述为更高维度的约束优化问题提供了一种替代方案,其中约束不再直接强制执行,而是通过惩罚项进行约束。本章介绍了一个统一框架,该框架囊括了多种提升式训练策略,包括辅助坐标法、Fenchel提升网络和提升Bregman训练,并展示了多层感知机、残差神经网络和近端神经网络等多种架构如何融入这一框架。通过利用凸优化工具(特别是Bregman距离),该框架促进了分布式优化,适应了不可微的近端激活函数,并能改善训练环境的条件数。我们讨论了使用块坐标下降策略实现这些方法,包括通过加速和自适应优化技术增强的确定性实现,以及隐式随机梯度方法。此外,我们探讨了该框架在反问题中的应用,详细介绍了专用网络(如展开架构)的训练方法和预训练网络的稳定反演方法。在标准成像任务上的数值结果表明,相较于传统训练方法,提升Bregman方法(特别是对于采用近端激活函数的架构)具有更高的有效性和稳定性。