Supervised Fine-Tuning (SFT) empowers Large Language Models (LLMs) with exceptional performance on specialized tasks, but it yields dense, high-dimensional delta parameters that pose severe storage and distribution challenges. Singular Value Decomposition (SVD)-based compression offers a compact representation for such delta parameters, but existing methods adopt heuristic quantization without clarifying underlying mechanisms, leading to poor generalizability. In this work, we propose PrinMix, a rigorous SVD-based framework that models quantization as an optimization problem, grounding the design in mathematical mechanisms. We first theoretically derive quantization error and identify a key singular-value-dominated scaling mechanism, which mathematically proves the necessity of mix-precision quantization. We then model the quantization scheme as a 0/1 Integer Linear Programming (ILP) problem, which yields optimal bit-budget-constrained solutions without empirical assumptions. Furthermore, PrinMix integrates a Reconstruction Target Correction (RTC) method to compensate for errors from the $\mathbf{V}$-then-$\mathbf{U}$ sequential quantization process. Extensive experiments confirm PrinMix performs well: for 7B LLMs, PrinMix outperforms SOTA Delta-CoMe on challenging benchmarks by 22.3% on AIME2024 and 6.1% on GQA.
翻译:监督微调赋能大语言模型在特定任务上取得卓越性能,但会产生稠密的高维增量参数,带来严峻的存储与分发挑战。基于奇异值分解的压缩方法为此类增量参数提供了紧凑表示,但现有方法采用启发式量化策略而未阐明其底层机制,导致泛化能力不足。本研究提出PrinMix,一个严谨的基于奇异值分解的框架,将量化建模为优化问题,并将其设计建立在数学机制之上。我们首先从理论上推导量化误差,并识别出一种关键的奇异值主导缩放机制,该机制从数学上证明了混合精度量化的必要性。随后,我们将量化方案建模为0/1整数线性规划问题,该问题可在无需经验假设的情况下,产生比特预算约束下的最优解。此外,PrinMix集成了重建目标校正方法,以补偿$\mathbf{V}$-后-$\mathbf{U}$顺序量化过程产生的误差。大量实验证实PrinMix表现优异:对于7B参数的大语言模型,PrinMix在具有挑战性的基准测试中优于当前最优的Delta-CoMe方法,在AIME2024上提升22.3%,在GQA上提升6.1%。