Large Language Models (LLMs) are reshaping the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods have emerged, such as Pruning and Quantization. Given the low-rank nature of weight matrices in language models, the reduction of weights through matrix decomposition undoubtedly holds significant potential and promise. In this paper, drawing upon the intrinsic structure of LLMs, we propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data. Additionally, we explore the fundamental properties of the weight matrix of LLMs undergone Rank-k Approximation and conduct comprehensive experiments to elucidate our hypothesis.
翻译:大语言模型(LLMs)正在重塑人工智能的研究格局,特别是随着模型参数的大幅扩展,其在多个领域展现出卓越能力。然而,由于GPU内存和计算速度的限制,模型参数的可扩展性面临挑战。为解决这些约束,出现了多种权重压缩方法,如剪枝和量化。鉴于语言模型中权重矩阵的低秩特性,通过矩阵分解缩减权重的方法具有显著潜力和前景。本文基于LLMs的内在结构,提出了一种名为无数据联合秩-k逼近的新方法用于压缩参数矩阵。值得注意的是,该方法无需引入任何语料库,同时能够与剪枝和量化方法保持正交性。我们实现了在无任何校准数据的情况下,模型参数压缩80%的同时保留原始性能的93.43%。此外,我们探索了经秩-k逼近后大语言模型权重矩阵的基本性质,并通过全面实验验证了我们的假设。