Improving Multi-task Learning via Seeking Task-based Flat Regions

Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory.

翻译：多任务学习（MTL）是一种广泛使用且强大的深度神经网络训练范式，其允许单个主干网络学习多个目标。与分别训练任务相比，MTL通过跨任务利用知识，显著降低了计算成本，提高了数据效率，并可能增强模型性能。因此，它已被应用于从计算机视觉到自然语言处理和语音识别的各种应用中。其中，MTL的一个新兴研究方向聚焦于操作任务梯度以推导出有益于所有任务的最终梯度下降方向。尽管在许多基准测试中取得了显著成果，但直接应用这些方法而未采用适当的正则化技术，可能会导致在实际问题中产生次优解。具体而言，标准训练通过最小化训练数据上的经验损失，容易过度拟合低资源任务或被带有噪声标签的数据破坏，这可能引发任务间的负迁移并导致整体性能下降。为缓解这些问题，我们提出利用一种最近引入的训练方法——锐度感知最小化（Sharpness-aware Minimization），该方法能增强模型在单任务学习中的泛化能力。相应地，我们提出了一种新颖的MTL训练方法，鼓励模型寻找基于任务的平坦最小值，以一致性地提升其在所有任务上的泛化能力。最后，我们在多种应用上进行了全面实验，以证明我们提出的方法相对于现有基于梯度的MTL方法的优势，这亦得到了我们理论分析的验证。

相关内容

多任务学习

关注 162

多任务学习（MTL）是机器学习的一个子领域，可以同时解决多个学习任务，同时利用各个任务之间的共性和差异。与单独训练模型相比，这可以提高特定任务模型的学习效率和预测准确性。多任务学习是归纳传递的一种方法，它通过将相关任务的训练信号中包含的域信息用作归纳偏差来提高泛化能力。通过使用共享表示形式并行学习任务来实现,每个任务所学的知识可以帮助更好地学习其它任务。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日