We introduce SetBERT, a fine-tuned BERT-based model designed to enhance query embeddings for set operations and Boolean logic queries, such as Intersection (AND), Difference (NOT), and Union (OR). SetBERT significantly improves retrieval performance for logic-structured queries, an area where both traditional and neural retrieval methods typically underperform. We propose an innovative use of inversed-contrastive loss, focusing on identifying the negative sentence, and fine-tuning BERT with a dataset generated via prompt GPT. Furthermore, we demonstrate that, unlike other BERT-based models, fine-tuning with triplet loss actually degrades performance for this specific task. Our experiments reveal that SetBERT-base not only significantly outperforms BERT-base (up to a 63% improvement in Recall) but also achieves performance comparable to the much larger BERT-large model, despite being only one-third the size.
翻译:本文介绍SetBERT,一种基于BERT的微调模型,旨在增强针对集合运算与布尔逻辑查询(如交集(AND)、差集(NOT)与并集(OR))的查询嵌入表示。SetBERT显著提升了逻辑结构化查询的检索性能,而该领域正是传统检索方法与神经检索方法通常表现欠佳之处。我们提出了一种创新的逆对比损失函数应用,其重点在于识别负例句子,并利用通过提示GPT生成的数据集对BERT进行微调。此外,我们证明,与其他基于BERT的模型不同,在此特定任务上使用三元组损失进行微调实际上会降低性能。我们的实验表明,SetBERT-base不仅显著优于BERT-base(Recall指标提升高达63%),而且尽管其规模仅为后者的三分之一,却能取得与庞大得多的BERT-large模型相媲美的性能。