In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.
翻译:本文通过实证研究探讨了多任务学习的优化动态,特别关注数据分布显著不平衡的任务集合。我们提出了一种简单有效的方法:先在高资源任务上进行预训练,再对高/低资源任务的混合数据进行微调。通过全面的实证研究与分析,我们展示了该方法相较于标准静态权重方法在性能权衡曲线上的持续改进效果。我们进一步分析了该方法适用的数据分布场景,并在神经机器翻译(NMT)与多语言语言建模任务中验证了其性能提升效果。