The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zxyxmu/DSnoT.
翻译:持续增长的大语言模型(LLMs)虽为未来通用人工智能开辟了潜在路径,却不幸在其设备端部署的征途上设置了巨大障碍。作为降低模型复杂度的经典方法之一,网络剪枝在LLMs时代似乎发展滞后,主要因为其在大规模参数量和训练数据场景下需要昂贵的微调(或再训练)过程。为弥合这一学术与工业界的鸿沟,我们提出动态无训练稀疏(DSnoT),这是一种无需反向传播和任何权重更新的免训练微调方法,能对稀疏LLMs进行轻量级优化。受动态稀疏训练启发,DSnoT通过迭代对稀疏LLMs执行权重剪枝与增长操作,以最小化稠密模型与稀疏模型之间的重构误差。为此,DSnoT特别考虑了剪枝与增长过程中重构误差的预期减少量,以及不同输入数据对每个权重增长产生的方差。该方法以线性时间复杂度高效运行,因为其无需通过反向传播微调LLMs。在LLaMA-V1/V2、Vicuna和OPT等模型上进行的广泛基准测试表明,DSnoT能有效提升稀疏LLMs的性能,尤其在高度稀疏场景下表现突出。例如,采用LLaMA-7B模型在70%稀疏度下,DSnoT可将困惑度较当前最优方法Wanda降低26.79。本论文为以高效免训练方式微调稀疏LLMs提供了全新思路,并为稀疏性在LLMs中的规模化应用开辟了新途径。代码开源地址为https://github.com/zxyxmu/DSnoT。