Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritize the alignment between a student and a teacher model for proper separation of positive and negative documents. This paper analyzes and evaluates the proposed loss function on the MS MARCO and BEIR datasets to demonstrate its effectiveness in improving the relevance of tested student models.
翻译:基于Transformer的文本文档检索与重排序模型通常通过知识蒸馏结合对比学习进行优化。当教师模型性能不佳时,过度校准可能导致训练效果下降,使得师生模型间的分布匹配变得困难。本文通过对比学习对KL散度项进行加权重标定,优先对齐学生模型与教师模型在正负文档区分中的关键特征。通过在MS MARCO和BEIR数据集上对提出的损失函数进行分析与评估,验证了该方法在提升受测学生模型相关性方面的有效性。