The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zyxxmu/DSnoT.
翻译:日益增长的大语言模型(LLMs)虽为即将到来的人工通用智能开辟了潜在路径,却不幸在其设备端部署之路上设置了巨大障碍。作为模型压缩领域最成熟的预LLM方法之一,网络剪枝在LLM时代明显滞后,这主要归因于其在海量模型参数和训练数据下所需的昂贵微调(或再训练)过程。为弥合这一产学研鸿沟,我们提出了动态稀疏免训练(DSnoT)方法——一种无需反向传播和任何权重更新的无训练微调方法,仅需对稀疏LLM进行轻微调整即可。受动态稀疏训练启发,DSnoT以迭代式剪枝再生长操作在稀疏LLM上实现密集模型与稀疏模型间重构误差的最小化。为此,DSnoT特别考虑了剪枝与生长操作对重构误差的预期降低效果,以及不同输入数据对每个权重生长的方差影响。该实践因无需反向传播进行LLM微调,可在线性时间内高效执行。在LLaMA-V1/V2、Vicuna和OPT模型上跨多个基准测试的大量实验表明,DSnoT在提升稀疏LLM性能方面具有显著效果,尤其在高稀疏度场景下表现突出。例如,在LLaMA-7B模型的70%稀疏度条件下,DSnoT在困惑度指标上以26.79的绝对优势超越当前最先进的Wanda方法。我们的工作为如何以高效免训练方式微调稀疏LLM提供了全新视角,并为将稀疏性的巨大潜力拓展至LLM开辟了新途径。代码已开源:https://github.com/zyxxmu/DSnoT。