Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks, meanwhile further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task. However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive. Hence, the question arises, are we able to improve the effectiveness and efficiency of tasks with no training examples through selective fine-tuning? In this paper, we explore statistical measures that approximate the divergence between domain representations as a means to estimate whether tuning using one task pair will exhibit performance benefits over tuning another. This estimation can then be used to reduce the number of task pairs that need to be tested by eliminating pairs that are unlikely to provide benefits. Through experimentation over 58 tasks and over 6,600 task pair combinations, we demonstrate that statistical measures can distinguish effective task pairs, and the resulting estimates can reduce end-to-end runtime by up to 40%.
翻译:从预训练神经语言模型中迁移学习到的模式已被证明能显著提升各类基于语言的任务的有效性,同时,在中间任务上进一步微调已被证明能带来额外的性能收益,前提是中间任务与目标任务足够相关。然而,如何识别相关任务仍是一个未解难题,且通过暴力搜索有效任务组合的成本过高。因此,问题随之而来:我们能否通过选择性微调来提升无训练样本任务的有效性和效率?本文探索了近似领域表示之间散度的统计度量,以此估计使用某一任务对进行微调是否比另一任务对带来更好的性能。该估计可用于剔除不太可能带来收益的任务对,从而减少需要测试的任务对数量。通过在58个任务和超过6600个任务对组合上的实验,我们证明统计度量能够区分有效的任务对,且由此产生的估计可将端到端运行时间减少高达40%。