Common approaches rely on fixed-length embedding vectors from language models as sentence embeddings for downstream tasks such as semantic textual similarity (STS). Such methods are limited in their flexibility due to unknown computational constraints and budgets across various applications. Matryoshka Representation Learning (MRL) (Kusupati et al., 2022) encodes information at finer granularities, i.e., with lower embedding dimensions, to adaptively accommodate ad hoc tasks. Similar accuracy can be achieved with a smaller embedding size, leading to speedups in downstream tasks. Despite its improved efficiency, MRL still requires traversing all Transformer layers before obtaining the embedding, which remains the dominant factor in time and memory consumption. This prompts consideration of whether the fixed number of Transformer layers affects representation quality and whether using intermediate layers for sentence representation is feasible. In this paper, we introduce a novel sentence embedding model called Two-dimensional Matryoshka Sentence Embedding (2DMSE). It supports elastic settings for both embedding sizes and Transformer layers, offering greater flexibility and efficiency than MRL. We conduct extensive experiments on STS tasks and downstream applications. The experimental results demonstrate the effectiveness of our proposed model in dynamically supporting different embedding sizes and Transformer layers, allowing it to be highly adaptable to various scenarios.
翻译:常用方法依赖语言模型生成的固定长度嵌入向量作为句子嵌入,用于语义文本相似度(STS)等下游任务。此类方法因不同应用场景的计算约束和预算未知而缺乏灵活性。嵌套表示学习(MRL)(Kusupati等,2022)以更细粒度(即更低嵌入维度)编码信息,从而自适应地适配特定任务。采用更小嵌入尺寸即可达到相近准确率,进而加速下游任务。尽管MRL提升了效率,但其仍需遍历所有Transformer层才能获取嵌入,这仍是时间和内存消耗的主要因素。这引发思考:固定数量的Transformer层是否影响表示质量?以及利用中间层进行句子表示是否可行?本文提出一种新型句子嵌入模型——二维嵌套句子嵌入(2DMSE),该模型支持对嵌入尺寸和Transformer层数进行弹性配置,比MRL具有更高的灵活性和效率。我们在STS任务及下游应用上开展广泛实验,结果表明我们提出的模型能动态支持不同嵌入尺寸和Transformer层数,从而高度适应各类场景。