Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that ``optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.

翻译：基于预训练语言模型（PLM）的搜索方法相比统计模型和早期神经排序模型展现出显著的效果提升。然而，微调基于PLM的排序器需要大量标注训练数据。标注数据涉及大量人工操作，成本高昂，尤其在领域特定任务中。本文研究了在有限训练数据和预算条件下微调基于PLM的排序器。我们探讨了两种场景：从头开始微调排序器，以及从已在通用数据上完成微调的排序器出发进行领域适配，并继续在目标数据集上进行微调。我们观察到，在不同随机选择的训练数据子集上微调时，效果存在巨大差异。这表明通过主动选择对排序器产生最积极影响的训练数据子集，有可能实现效果提升。这样，就能在减少标注预算的情况下微调出高效的PLM排序器。为此，我们将现有主动学习（AL）策略适配到PLM排序器微调任务中，并考察其有效性，同时兼顾标注与计算成本。我们的广泛分析表明，从效果角度看，AL策略并未显著优于训练子集的随机选择。我们进一步发现，AL策略带来的效果提升是以更多评估（即更高标注成本）为代价的，且在固定标注成本下比较效果时，AL策略的表现不如随机选择。我们的结果突显了“最优”训练数据子集确实存在（能以低标注成本实现高效果），但目前应用于PLM排序器的主流AL策略无法识别它们。