Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130$\times$, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7% for PIQA, 66.1% for Winograde, and so on).
翻译:由于大型语言模型(LLMs)在许多复杂语言任务上展示了高质量性能,将LLMs部署到移动设备以提供更快的响应和更好的隐私保护引起了广泛关注。然而,LLMs的规模(即数十亿参数)需要高度有效的压缩才能适配存储受限的设备。在众多压缩技术中,权重聚类(一种非线性量化形式)是LLM压缩的主要候选方法之一,并得到了现代智能手机的支持。但其训练开销对LLM微调而言过高。特别是可微分K均值聚类(DKM)虽展现出压缩比与精度回归之间的最优权衡,但其巨大的内存复杂度使其几乎无法应用于LLM训练时压缩。本文提出了一种内存高效的DKM实现——eDKM,通过创新技术将DKM的内存占用降低数个数量级。针对需保存在CPU中用于DKM反向传播的给定张量,我们在检查张量是否未重复复制至CPU后,通过应用唯一化与分片技术对其进行压缩。实验结果表明,eDKM可在Alpaca数据集上微调并压缩预训练的LLaMA 7B模型(从12.6 GB压缩至2.5 GB,3bit/权重),同时将解码器层的训练时内存占用降低130倍,并在广泛LLM基准测试中保持良好精度(例如PIQA 77.7%,Winograde 66.1%等)。