ZipNN：AI模型的无损压缩 (ZipNN: Lossless Compression for AI Models)

Moshik Hershcovitch,Andrew Wood,Leshem Choshen,Guy Girmonsky,Roy Leibovitz,Ilias Ennmouri,Michal Malka,Peter Chin,Swaminathan Sundararaman,Danny Harnik

from arxiv, arXiv admin note: substantial text overlap with arXiv:2404.15198

With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression. We present ZipNN a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.

翻译：随着模型规模及其部署范围的扩大，其庞大的体量对基础设施造成了负担，需要更多网络和存储资源来容纳这些模型。尽管现有大量模型压缩研究通过删除部分模型权重以实现更快推理，我们探索了一种更为传统的压缩方式——即以紧凑形式表示模型，并配合解压算法使其恢复原始形态与尺寸的无损压缩。本文提出ZipNN，一种专为神经网络设计的无损压缩方法。令人惊讶的是，我们发现针对性的无损压缩能在主流模型上实现显著的网络与存储缩减，通常可节省33%的空间，有时甚至能减少超过50%的模型体积。我们深入探究了模型可压缩性的来源，并针对模型特性设计了专用压缩变体，进一步提升了压缩效能。在主流模型（如Llama 3）上，ZipNN的空间节省效果比通用压缩方法提升超过17%，同时将压缩与解压速度提高了62%。据估算，这些方法每月可为Hugging Face等大型模型集散中心节省超过1艾字节的网络下行流量。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日