To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods are still too trivial for the teacher to distinguish, preventing the teacher from transferring abundant dark knowledge to the student through its soft label. To alleviate this issue, we propose ADAM, a knowledge distillation framework that can better transfer the dark knowledge held in the teacher with Adaptive Dark exAMples. Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space. Furthermore, as the quality of knowledge held in different training instances varies as measured by the teacher's confidence score, we propose a self-paced distillation strategy that adaptively concentrates on a subset of high-quality instances to conduct our dark-example-based knowledge distillation to help the student learn better. We conduct experiments on two widely-used benchmarks and verify the effectiveness of our method.
翻译:为提升双编码器检索器的性能,一种有效方法是利用交叉编码器排序器进行知识蒸馏。现有工作遵循监督学习范式构建候选段落,即将查询与正例段落及一批负例配对。然而,通过实证观察我们发现,即使先进方法产生的困难负例对教师模型而言仍过于简单,导致教师模型无法通过其软标签向学生模型传递丰富的暗知识。为缓解此问题,我们提出ADAM——一种能通过自适应暗样本更有效传递教师模型暗知识的知识蒸馏框架。与先前仅依赖单个正例和困难负例作为候选段落的研究不同,我们通过在离散空间进行混合与掩码操作,构建出与查询均具有适度相关性的暗样本。此外,鉴于不同训练实例所含知识质量存在差异(通过教师模型的置信度得分衡量),我们提出自步调蒸馏策略,该策略自适应地聚焦于高质量实例子集,以执行基于暗样本的知识蒸馏,从而帮助学生模型实现更优学习。我们在两个广泛使用的基准数据集上进行实验,验证了本方法的有效性。