Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks, meanwhile further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task. However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive. Hence, the question arises, are we able to improve the effectiveness and efficiency of tasks with no training examples through selective fine-tuning? In this paper, we explore statistical measures that approximate the divergence between domain representations as a means to estimate whether tuning using one task pair will exhibit performance benefits over tuning another. This estimation can then be used to reduce the number of task pairs that need to be tested by eliminating pairs that are unlikely to provide benefits. Through experimentation over 58 tasks and over 6,600 task pair combinations, we demonstrate that statistical measures can distinguish effective task pairs, and the resulting estimates can reduce end-to-end runtime by up to 40%.
翻译:从预训练神经语言模型中迁移学习到的模式已被证明能显著提升各类语言任务的有效性。同时,在中间任务上进行进一步微调,若该中间任务与目标任务充分相关,则可带来额外的性能提升。然而,如何识别相关任务仍是一个开放性问题,通过暴力搜索有效任务组合的计算成本过高。由此引发的问题在于:我们能否通过选择性微调,在没有训练样本的任务中提升有效性和效率?本文探索了统计度量方法,通过近似领域表示之间的散度来估计:使用某任务对进行微调是否比另一任务对更具性能优势。这种估计可用于排除不太可能带来收益的任务对,从而减少需要测试的任务组合数量。通过在58个任务和超过6600个任务对组合上的实验,我们证明了统计度量能够区分有效任务对,且所得估计可将端到端运行时间减少高达40%。