The progress introduced by pre-trained language models and their fine-tuning has resulted in significant improvements in most downstream NLP tasks. The unsupervised training of a language model combined with further target task fine-tuning has become the standard QA fine-tuning procedure. In this work, we demonstrate that this strategy is sub-optimal for fine-tuning QA models, especially under a low QA annotation budget, which is a usual setting in practice due to the extractive QA labeling cost. We draw our conclusions by conducting an exhaustive analysis of the performance of the alternatives of the sequential fine-tuning strategy on different QA datasets. Based on the experiments performed, we observed that the best strategy to fine-tune the QA model in low-budget settings is taking a pre-trained language model (PLM) and then fine-tuning PLM with a dataset composed of the target dataset and SQuAD dataset. With zero extra annotation effort, the best strategy outperforms the standard strategy by 2.28% to 6.48%. Our experiments provide one of the first investigations on how to best fine-tune a QA system under a low budget and are therefore of the utmost practical interest to the QA practitioners.
翻译:预训练语言模型及其微调所带来的进展已显著提升大多数下游NLP任务的性能。无监督语言模型训练结合后续目标任务微调已成为标准的问答(QA)微调流程。本研究表明,这种策略对于QA模型微调而言并非最优,尤其是在低QA标注预算条件下——由于抽取式问答标注成本高昂,这一设定在实际应用中十分常见。我们通过对不同QA数据集上顺序微调策略的替代方案进行详尽的性能分析,得出了上述结论。基于实验观察发现,在低预算场景下微调QA模型的最佳策略是:采用预训练语言模型(PLM),随后使用由目标数据集与SQuAD数据集组合而成的数据集对PLM进行微调。在无需额外标注工作量的条件下,该最优策略相较于标准策略提升了2.28%至6.48%的性能。本实验首次系统探究了如何在低预算条件下最优地微调QA系统,因此对QA从业者具有极高的实践参考价值。