Lossless Model Compression via Joint Low-Rank Factorization Optimization

Low-rank factorization is a popular model compression technique that minimizes the error $\delta$ between approximated and original weight matrices. Despite achieving performances close to the original models when $\delta$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

翻译：低秩分解是一种流行的模型压缩技术，旨在最小化近似权重矩阵与原始权重矩阵之间的误差$\delta$。尽管在优化$\delta$时能够获得接近原始模型的性能，但由于低秩分解与模型性能的优化过程相互分离，导致不可避免的性能损失，从而仍存在性能差异。我们通过引入一种新颖的无损低秩权重分解联合优化策略来解决这一问题，该策略首次将模型性能提升至超越原始模型的水平。我们的方法首先对低秩分解与模型优化目标之间的关系进行理论分析，为矩阵分解误差对模型性能的影响建立精确的扰动范围。随后，我们将此问题重新表述为一个具有不等式约束的数值秩亏缺问题，并构建了一个同时处理分解误差与模型性能的联合目标函数。基于上述分析，我们提出了两种优化算法：\textbf{一种无损优化算法}，在确保压缩的同时最大化模型精度；以及\textbf{一种紧凑优化算法}，在保持性能的同时最小化模型规模。这些算法无需微调，即可直接压缩多种深度模型以实现无损结果。我们的方法在多种视觉与语言任务中均展现出稳健的有效性。例如，在ResNext50上压缩70%的模型性能优于原始模型。我们的代码将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日