Sentence embedding methods have made remarkable progress, yet they still struggle to capture the implicit semantics within sentences. This can be attributed to the inherent limitations of conventional sentence embedding methods that assign only a single vector per sentence. To overcome this limitation, we propose DualCSE, a sentence embedding method that assigns two embeddings to each sentence: one representing the explicit semantics and the other representing the implicit semantics. These embeddings coexist in the shared space, enabling the selection of the desired semantics for specific purposes such as information retrieval and text classification. Experimental results demonstrate that DualCSE can effectively encode both explicit and implicit meanings and improve the performance of the downstream task.
翻译:句子嵌入方法已取得显著进展,但在捕捉句内隐式语义方面仍面临挑战。这归因于传统句子嵌入方法固有的局限性——每个句子仅分配单个向量。为克服此局限,我们提出DualCSE,一种为每个句子分配双重嵌入的句子嵌入方法:其一表征显式语义,其二表征隐式语义。这些嵌入共存于共享空间,可根据信息检索、文本分类等特定需求选择对应语义表征。实验结果表明,DualCSE能有效编码显式与隐式语义,并提升下游任务性能。