Common approaches rely on fixed-length embedding vectors from language models as sentence embeddings for downstream tasks such as semantic textual similarity (STS). Such methods are limited in their flexibility due to unknown computational constraints and budgets across various applications. Matryoshka Representation Learning (MRL) \cite{aditya2022matryoshka} encodes information at finer granularities, i.e., with lower embedding dimensions, to adaptively accommodate \emph{ad hoc} tasks. Similar accuracy can be achieved with a smaller embedding size, leading to speedups in downstream tasks. Despite its improved efficiency, MRL still requires traversing all Transformer layers before obtaining the embedding, which remains the dominant factor in time and memory consumption. This prompts consideration of whether the fixed number of Transformer layers affects representation quality and whether using intermediate layers for sentence representation is feasible. In this paper, we introduce a novel sentence embedding model called \textit{Two-dimensional Matryoshka Sentence Embedding} (2DMSE)\footnote{Our code is available at \url{https://github.com/SeanLee97/AnglE/blob/main/README_2DMSE.md}.}. It supports elastic settings for both embedding sizes and Transformer layers, offering greater flexibility and efficiency than MRL. We conduct extensive experiments on STS tasks and downstream applications. The experimental results demonstrate the effectiveness of our proposed model in dynamically supporting different embedding sizes and Transformer layers, allowing it to be highly adaptable to various scenarios.
翻译:现有方法通常依赖语言模型生成固定长度的嵌入向量作为句子嵌入,用于语义文本相似度(STS)等下游任务。由于不同应用场景的计算约束与资源预算未知,此类方法在灵活性方面存在局限。嵌套式表示学习(MRL)通过更细粒度的信息编码(即使用更低维的嵌入)来自适应地满足临时任务需求,在较小嵌入尺寸下仍能达到相近的准确率,从而加速下游任务。尽管MRL提升了效率,但仍需遍历所有Transformer层才能获得嵌入,而这一过程依然是时间和内存消耗的主要因素。这促使我们思考:固定的Transformer层数是否会影响表示质量?利用中间层进行句子表示是否可行?本文提出一种新型句子嵌入模型——二维嵌套式句子嵌入(2DMSE),该模型同时支持嵌入尺寸与Transformer层数的弹性设置,相比MRL具备更强的灵活性与效率。我们在STS任务及下游应用中进行了大量实验,结果表明所提模型能有效动态支持不同嵌入尺寸与Transformer层数,从而高度适应多样化应用场景。