High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, the associated high dimensionality also introduces considerable model parameters, and a prohibitively high model storage. To address this issue, this work proposes an approach based on the Tensor-Train Decomposition (TTD), where each token embedding is treated as a Matrix Product State (MPS) that can be efficiently computed in a distributed manner. The experimental results on GPT-2 demonstrate that, through our approach, the embedding layer can be compressed by a factor of up to 38.40 times, and when the compression factor is 3.31 times, even produced a better performance than the original GPT-2 model.
翻译:高维词元嵌入是大语言模型(LLMs)的基础,因其能够捕捉细微的语义信息并显著增强复杂语言模式的建模能力。然而,高维度也引入了大量模型参数,并导致模型存储成本过高。为解决该问题,本文提出一种基于张量列分解(TTD)的方法,将每个词元嵌入视为一个矩阵乘积态(MPS),从而能够以分布式方式进行高效计算。在GPT-2上的实验结果表明,通过本方法,嵌入层可被压缩高达38.40倍;当压缩倍数为3.31时,甚至产生了优于原始GPT-2模型的性能。