Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We observe that for many deep models, updates to the weight matrices occur within a low-dimensional invariant subspace. For deep linear models, we demonstrate that their principal components are fitted incrementally within a small subspace, and use these insights to propose a compression algorithm for deep linear networks that involve decreasing the width of their intermediate layers. We empirically evaluate the effectiveness of our compression technique on matrix recovery problems. Remarkably, by using an initialization that exploits the structure of the problem, we observe that our compressed network converges faster than the original network, consistently yielding smaller recovery errors. We substantiate this observation by developing a theory focused on deep matrix factorization. Finally, we empirically demonstrate how our compressed model has the potential to improve the utility of deep nonlinear models. Overall, our algorithm improves the training efficiency by more than 2x, without compromising generalization.

翻译：过参数化模型已被证明是解决各种机器学习任务的强大工具。然而，过参数化通常会导致计算和内存成本大幅增加，进而需要大量资源进行训练。在本工作中，我们提出了一种通过研究过参数化模型学习动力学进行压缩的新方法。我们观察到，对于许多深度模型而言，权重矩阵的更新发生在低维不变子空间内。对于深度线性模型，我们证明其主成分是在小子空间内逐步拟合的，并利用这些见解提出了一种针对深度线性网络的压缩算法，该算法涉及减小其中间层的宽度。我们在矩阵恢复问题上经验性地评估了压缩技术的有效性。值得注意的是，通过使用利用问题结构的初始化，我们观察到压缩网络的收敛速度比原始网络更快，且始终获得更小的恢复误差。我们通过发展以深度矩阵分解为中心的理论证实了这一观察结果。最后，我们经验性地展示了压缩模型如何具有提升深度非线性模型效用的潜力。总体而言，我们的算法在不影响泛化能力的情况下，将训练效率提升了2倍以上。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日