基于联合低秩分解优化的无损模型压缩 (Lossless Model Compression via Joint Low-Rank Factorization Optimization)

Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the original models when $δ$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

翻译：低秩分解是一种流行的模型压缩技术，旨在最小化近似权重矩阵与原始权重矩阵之间的误差$δ$。尽管在优化$δ$时能够获得接近原始模型的性能，但由于低秩分解与模型性能的优化过程相互分离，仍存在不可避免的性能差异。我们通过引入一种新颖的无损低秩权重分解联合优化策略来解决这一问题，该策略首次实现了超越原始模型的性能提升。我们的方法首先对低秩分解与模型优化目标之间的关系进行理论分析，确立了矩阵分解误差对模型性能影响的精确扰动范围。随后，我们将该问题重新表述为具有不等式约束的数值秩亏缺问题，并构建了一个同时处理分解误差与模型性能的联合优化目标。基于上述分析，我们提出了两种优化算法：\textbf{一种无损优化算法}，在确保压缩的同时最大化模型精度；以及\textbf{一种紧凑优化算法}，在保持性能的同时最小化模型规模。这些算法无需微调，即可直接压缩多种深度模型以实现无损结果。我们的方法在多种视觉与语言任务中均展现出稳健的有效性。例如，在ResNext50上压缩70%的模型性能优于原始模型。我们的代码将公开提供。