State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.
翻译:当前最先进的深度学习模型参数量已达数十亿级别。训练、存储及传输此类模型需消耗大量能源与时间,成本高昂,其中训练过程占据主要开销。模型压缩技术可降低存储与传输成本,并能通过减少前向/反向传播中的计算量进一步提升训练效率。因此,在保持高性能的同时对网络训练阶段实施压缩已成为重要研究课题。本文系统综述了在训练过程中减少深度学习模型训练权重的各类方法。所介绍的多数方法通过将网络参数置零实现压缩,即剪枝技术。文中将剪枝方法分类为:初始化剪枝、彩票假设与动态稀疏训练。同时,我们探讨了在随机初始化阶段冻结网络部分参数的方法——通过冻结权重减少可训练参数量,从而降低梯度计算量并缩减模型优化空间的维度。本综述首先提出"降维训练"这一数学框架,其可统一涵盖训练中的剪枝与冻结操作;进而呈现并讨论多种降维训练方法。