TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces considerable model parameters and prohibitively high model storage and memory requirements, which is particularly unaffordable for low-end devices. Targeting no extra training data and insufficient computation cases, we propose a training-free model compression approach based on the Tensor-Train Decomposition (TTD), whereby each pre-trained token embedding is converted into a lower-dimensional Matrix Product State (MPS). We then comprehensively investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi). Taking GPT family models (i.e. GPT-2 and CerebrasGPT) as case studies, our approach theoretically results in $46.89\%$ fewer parameters of the entire model, with a compression ratio $39.38\times$ - $65.64\times$ for the embedding layers. With different hyperparameter choices, the model compressed with our approach can achieve a comparable language task performance to the original model with around $2.0\times$ embedding layer compression. This empirically proves the existence of low-rank structure in GPT family models, and demonstrates that about half of the parameters in the embedding layers are redundant.

翻译：高维词元嵌入是大型语言模型（LLMs）的核心基础，因其能捕捉细微的语义信息并显著增强复杂语言模式的建模能力。然而，这种高维特性也带来了大量模型参数以及极高的模型存储与内存需求，这对低端设备而言尤为难以承受。针对无额外训练数据且计算资源受限的场景，我们提出一种基于张量序列分解（TTD）的无训练模型压缩方法，将每个预训练词元嵌入转换为低维矩阵乘积态（MPS）。我们系统研究了该方法提取的低秩结构特性，涵盖压缩比、语言任务性能及典型低端设备（如树莓派）上的延迟表现。以GPT系列模型（包括GPT-2与CerebrasGPT）为案例，我们的方法理论上可使整体模型参数量减少46.89%，嵌入层的压缩比达到39.38倍至65.64倍。通过调整超参数配置，经本方法压缩的模型在嵌入层压缩约2.0倍时，仍能获得与原始模型相当的语言任务性能。这从实证角度证明了GPT系列模型中存在低秩结构，并表明嵌入层中约半数参数具有冗余性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日