Traditional neural word embeddings are usually dependent on a richer diversity of vocabulary. However, the language models recline to cover major vocabularies via the word embedding parameters, in particular, for multilingual language models that generally cover a significant part of their overall learning parameters. In this work, we present a new compact embedding structure to reduce the memory footprint of the pre-trained language models with a sacrifice of up to 4% absolute accuracy. The embeddings vectors reconstruction follows a set of subspace embeddings and an assignment procedure via the contextual relationship among tokens from pre-trained language models. The subspace embedding structure calibrates to masked language models, to evaluate our compact embedding structure on similarity and textual entailment tasks, sentence and paraphrase tasks. Our experimental evaluation shows that the subspace embeddings achieve compression rates beyond 99.8% in comparison with the original embeddings for the language models on XNLI and GLUE benchmark suites.
翻译:传统神经词嵌入通常依赖于更丰富的词汇多样性。然而,语言模型倾向于通过词嵌入参数覆盖主要词汇,特别是对于多语言语言模型而言,这些参数通常占据其整体学习参数的显著部分。本文提出一种新的紧凑嵌入结构,以牺牲最高4%的绝对准确率为代价,减少预训练语言模型的内存占用。嵌入向量的重建过程遵循一组子空间嵌入,并通过预训练语言模型中标记间的上下文关系进行分配。该子空间嵌入结构针对掩码语言模型进行了校准,并在相似性、文本蕴含任务、句子及释义任务上评估了我们的紧凑嵌入结构。实验结果表明,在XNLI和GLUE基准测试套件上,与语言模型的原始嵌入相比,子空间嵌入实现了超过99.8%的压缩率。