Taking inspiration from Set Theory, we introduce SetCSE, an innovative information retrieval framework. SetCSE employs sets to represent complex semantics and incorporates well-defined operations for structured information querying under the provided context. Within this framework, we introduce an inter-set contrastive learning objective to enhance comprehension of sentence embedding models concerning the given semantics. Furthermore, we present a suite of operations, including SetCSE intersection, difference, and operation series, that leverage sentence embeddings of the enhanced model for complex sentence retrieval tasks. Throughout this paper, we demonstrate that SetCSE adheres to the conventions of human language expressions regarding compounded semantics, provides a significant enhancement in the discriminatory capability of underlying sentence embedding models, and enables numerous information retrieval tasks involving convoluted and intricate prompts which cannot be achieved using existing querying methods.
翻译:受集合论启发,我们提出SetCSE——一种创新的信息检索框架。SetCSE利用集合表征复杂语义,并在给定语境下引入定义明确的运算以实现结构化信息查询。在该框架中,我们提出一种集合间对比学习目标,以增强句子嵌入模型对特定语义的理解能力。此外,我们构建了一套运算体系,包含SetCSE交集、差集及运算序列,通过增强模型的句子嵌入完成复杂句子检索任务。本文证明,SetCSE符合人类语言对复合语义的表达惯例,显著提升底层句子嵌入模型的判别能力,并能实现现有查询方法无法完成的涉及复杂且精细提示的多项信息检索任务。