Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can sometimes even enhance accuracy by removing noise that accumulates during training, particularly through matrix decompositions. However, recent work has primarily focused on single matrix decompositions or lower precision techniques, which may fail to fully capture structural patterns. To address these limitations, we introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
翻译:近期研究表明,对大规模语言模型进行推理剪枝是提升模型效率的有效途径,能在对性能影响极小的前提下显著减少模型权重。有趣的是,剪枝有时甚至能通过消除训练过程中积累的噪声(特别是通过矩阵分解方式)来提升模型精度。然而,现有研究主要集中于单矩阵分解或低精度技术,这些方法可能无法充分捕捉结构模式。为突破这些局限,我们提出了TRAWL(面向大语言模型的张量约减与权重近似)技术,该方法通过对多个权重矩阵实施张量分解,有效捕捉全局结构模式以实现大语言模型的去噪。实验表明,在基准数据集上,TRAWL无需额外数据、训练或微调即可将模型性能较基线提升最高达16%。