We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associated with factorization. We demonstrate this method applied to the GPT-2$_{small}$. The result of the compression is TQCompressedGPT-2 model, featuring 81 mln. parameters compared to 124 mln. in the GPT-2$_{small}$. We make TQCompressedGPT-2 publicly available. We further enhance the performance of the TQCompressedGPT-2 through a training strategy involving multi-step knowledge distillation, using only a 3.1% of the OpenWebText. TQCompressedGPT-2 surpasses DistilGPT-2 and KnGPT-2 in comparative evaluations, marking an advancement in the efficient and effective deployment of models in resource-constrained environments.
翻译:我们提出TQCompressor,一种通过改进张量分解实现神经网络模型压缩的新方法。本文探索了自然语言处理任务中预训练语言模型的计算与存储需求带来的挑战,并提出一种基于置换的Kronecker分解增强方案。该增强方案能够减少通常由分解导致的模型表达能力损失。我们将该方法应用于GPT-2$_{small}$,压缩结果得到拥有8100万参数的TQCompressedGPT-2模型(原GPT-2$_{small}$为1.24亿参数)。我们开源了TQCompressedGPT-2模型。通过采用仅使用OpenWebText数据3.1%的多步知识蒸馏训练策略,进一步提升了TQCompressedGPT-2的性能。对比评估表明,TQCompressedGPT-2超越了DistilGPT-2和KnGPT-2,标志着在资源受限环境下高效部署模型方面取得了新进展。