In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pre-trained large language models have been quite effective for computing such representations. These models produce high-dimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model Sentence-BERT by augmenting the pre-trained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2)objective, a novel approach developed for general-purpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
翻译:在大多数自然语言推理问题中,句表示对于语义检索任务至关重要。近年来,预训练大语言模型在计算此类表示方面取得了显著成效。这类模型会产生高维句嵌入。实践中,大型模型与小型模型之间存在着明显的性能差距。因此,受限于空间和时间硬件条件,在使用小型模型(通常是大语言模型的蒸馏版本)时,需要获得与之相当的结果。本文通过向预训练蒸馏模型额外添加一个基于最大编码率缩减(MCR2)目标(一种专为通用流形聚类设计的新方法)学习的投影层,对句表示模型Sentence-BERT进行模型蒸馏评估。我们证明,这种降低复杂度与句嵌入维度的新型语言模型,能够在语义检索基准测试中取得可比较的结果。