Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we aim to reduce this complexity by studying the learning dynamics of overparameterized deep networks. By extensively studying its learning dynamics, we unveil that the weight matrices of various architectures exhibit a low-dimensional structure. This finding implies that we can compress the networks by reducing the training to a small subspace. We take a step in developing a principled approach for compressing deep networks by studying deep linear models. We demonstrate that the principal components of deep linear models are fitted incrementally but within a small subspace, and use these insights to compress deep linear networks by decreasing the width of its intermediate layers. Remarkably, we observe that with a particular choice of initialization, the compressed network converges faster than the original network, consistently yielding smaller recovery errors throughout all iterations of gradient descent. We substantiate this observation by developing a theory focused on the deep matrix factorization problem, and by conducting empirical evaluations on deep matrix sensing. Finally, we demonstrate how our compressed model can enhance the utility of deep nonlinear models. Overall, we observe that our compression technique accelerates the training process by more than 2x, without compromising model quality.

翻译：过参数化模型已被证明是解决各种机器学习任务的有力工具。然而，过参数化常常导致计算和内存成本显著增加，进而需要大量资源进行训练。本研究旨在通过深入探究过参数化深度网络的学习动力学来降低这一复杂度。通过广泛研究其学习动力学，我们发现不同架构的权重矩阵呈现出低维结构。这一发现表明，我们可以通过将训练过程限制在小子空间中来压缩网络。我们通过研究深度线性模型，迈出了开发深度网络压缩原则性方法的一步。我们证明深度线性模型的主成分是在小子空间内逐步拟合的，并利用这一见解通过减小中间层宽度来压缩深度线性网络。值得注意的是，我们观察到，在特定初始化选择下，压缩网络比原始网络收敛更快，并在梯度下降的所有迭代中持续产生更小的恢复误差。我们通过聚焦于深度矩阵分解问题的理论分析以及在深度矩阵感知任务上的实证评估来验证这一观察。最后，我们展示了压缩模型如何提升深度非线性模型的效用。总体而言，我们观察到所提出的压缩技术能够在不影响模型质量的前提下，将训练过程加速超过两倍。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日