Fine-tuning in information retrieval systems using pre-trained language models (PLM-based IR) requires learning query representations and query-document relations, in addition to downstream task-specific learning. This study introduces coarse-tuning as an intermediate learning stage that bridges pre-training and fine-tuning. By learning query representations and query-document relations in coarse-tuning, we aim to reduce the load of fine-tuning and improve the learning effect of downstream IR tasks. We propose Query-Document Pair Prediction (QDPP) for coarse-tuning, which predicts the appropriateness of query-document pairs. Evaluation experiments show that the proposed method significantly improves MRR and/or nDCG@5 in four ad-hoc document retrieval datasets. Furthermore, the results of the query prediction task suggested that coarse-tuning facilitated learning of query representation and query-document relations.
翻译:基于预训练语言模型的信息检索系统(PLM-based IR)中的微调过程,除需学习下游任务特定知识外,还需学习查询表示及查询-文档关系。本研究提出粗调优作为连接预训练与微调的中介学习阶段,旨在通过粗调优阶段对查询表示和查询-文档关系的预学习,减轻微调阶段的学习负担,提升下游IR任务的学习效果。我们提出用于粗调优的查询-文档对预测(QDPP)方法,该方法通过预测查询-文档对的匹配程度进行学习。评估实验表明,所提方法在四个即席文档检索数据集上显著提升了MRR和/或nDCG@5指标。此外,查询预测任务的实验结果表明,粗调优有效促进了查询表示及查询-文档关系的学习。