Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone hence improves model efficiency. It also enables potential positive knowledge transfer across tasks/domains, leading to improved accuracy and data-efficient training. However, optimizing such networks is a challenge, in particular due to discrepancies between the different tasks or domains: Despite several hypotheses and solutions proposed over the years, recent work has shown that uniform scalarization training, i.e., simply minimizing the average of the task losses, yields on-par performance with more costly SotA optimization methods. This raises the issue of how well we understand the training dynamics of multi-task and multi-domain networks. In this work, we first devise a large-scale unified analysis of multi-domain and multi-task learning to better understand the dynamics of scalarization across varied task/domain combinations and model sizes. Following these insights, we then propose to leverage population-based training to efficiently search for the optimal scalarization weights when dealing with a large number of tasks or domains.
翻译:训练一个涵盖多个输入领域和/或输出任务的单一模型,能够将多源信息压缩至统一骨干网络中,从而提升模型效率。该方法还可实现任务/领域间的潜在正向知识迁移,进而提高准确率与数据利用效率。然而,此类网络的优化面临挑战,尤其源于不同任务或领域之间的差异:尽管多年来已提出多种假设与解决方案,但近期研究表明,统一标量化训练(即仅最小化各任务损失的平均值)可获得与更昂贵的SotA优化方法相当的性能。这引发了我们对多任务与多领域网络训练动态的理解程度的质疑。本文首先通过大规模统一分析多领域与多任务学习,深入探究标量化在不同任务/领域组合及模型规模下的动态特性。基于这些见解,我们进一步提出利用群体训练方法,在高任务或领域数量场景下高效搜索最优标量化权重。