This paper explores a new post-hoc training-free compression paradigm for compressing Large Language Models (LLMs) to facilitate their wider adoption in various computing environments. We delve into the challenges of LLM compression, notably their dependency on extensive training data and computational resources. We propose a training-free approach dubbed Activation-aware Singular Value Decomposition (ASVD) to address these limitations. ASVD effectively manages activation outliers by adjusting the weight matrix based on the activation distribution, improving decomposition accuracy and efficiency. Our method also addresses the varying sensitivity of different LLM layers to decomposition, with an iterative calibration process for optimal layer-specific decomposition. Experiments demonstrate that ASVD can compress network by 10%-20% without losing reasoning capacities. Additionally, it can be seamlessly integrated with other LLM compression paradigms, showcasing its flexible compatibility. Code and compressed models are available at https://github.com/hahnyuan/ASVD4LLM.
翻译:本文探索了一种新的无需训练的后处理压缩范式,用于压缩大型语言模型,以促进其在多种计算环境中的广泛采用。我们深入研究了LLM压缩面临的挑战,特别是其对大量训练数据和计算资源的依赖性。为突破这些限制,我们提出了一种无需训练的方法——激活感知奇异值分解。ASVD通过基于激活分布调整权重矩阵,有效管理激活异常值,从而提升分解精度与效率。该方法还针对不同LLM层对分解的敏感度差异,采用迭代校准过程实现最优分层分解。实验表明,ASVD可在不损失推理能力的情况下将网络压缩10%-20%。此外,该方法能与其他LLM压缩范式无缝集成,展现出灵活的兼容性。相关代码与压缩模型已开源至https://github.com/hahnyuan/ASVD4LLM。