Recent studies suggest that context-aware low-rank approximation is a useful tool for compression and fine-tuning of modern large-scale neural networks. In this type of approximation, a norm is weighted by a matrix of input activations, significantly improving metrics over the unweighted case. Nevertheless, existing methods for neural networks suffer from numerical instabilities due to their reliance on classical formulas involving explicit Gram matrix computation and their subsequent inversion. We demonstrate that this can degrade the approximation quality or cause numerically singular matrices. To address these limitations, we propose a novel inversion-free regularized framework that is based entirely on stable decompositions and overcomes the numerical pitfalls of prior art. Our method can handle possible challenging scenarios: (1) when calibration matrices exceed GPU memory capacity, (2) when input activation matrices are nearly singular, and even (3) when insufficient data prevents unique approximation. For the latter, we prove that our solution converges to a desired approximation and derive explicit error bounds.
翻译:近期研究表明,上下文感知低秩逼近是压缩和微调现代大规模神经网络的有效工具。在此类逼近中,范数通过输入激活矩阵进行加权,相比未加权情况显著提升了各项评估指标。然而,现有神经网络方法因依赖涉及显式格拉姆矩阵计算及其后续求逆的经典公式,存在数值不稳定问题。我们证明这可能导致逼近质量下降或产生数值奇异矩阵。为克服这些局限,我们提出一种基于稳定分解的全新免求逆正则化框架,解决了现有方法的数值缺陷。我们的方法能够处理以下潜在挑战场景:(1)校准矩阵超出GPU内存容量时,(2)输入激活矩阵接近奇异时,甚至(3)数据不足导致无法获得唯一逼近时。针对最后一种情况,我们证明了所提解收敛于期望逼近,并推导出显式误差界。