Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory.
翻译:多任务学习(MTL)是一种广泛使用且强大的深度神经网络训练范式,其允许单个主干网络学习多个目标。与分别训练任务相比,MTL通过跨任务利用知识,显著降低了计算成本,提高了数据效率,并可能增强模型性能。因此,它已被应用于从计算机视觉到自然语言处理和语音识别的各种应用中。其中,MTL的一个新兴研究方向聚焦于操作任务梯度以推导出有益于所有任务的最终梯度下降方向。尽管在许多基准测试中取得了显著成果,但直接应用这些方法而未采用适当的正则化技术,可能会导致在实际问题中产生次优解。具体而言,标准训练通过最小化训练数据上的经验损失,容易过度拟合低资源任务或被带有噪声标签的数据破坏,这可能引发任务间的负迁移并导致整体性能下降。为缓解这些问题,我们提出利用一种最近引入的训练方法——锐度感知最小化(Sharpness-aware Minimization),该方法能增强模型在单任务学习中的泛化能力。相应地,我们提出了一种新颖的MTL训练方法,鼓励模型寻找基于任务的平坦最小值,以一致性地提升其在所有任务上的泛化能力。最后,我们在多种应用上进行了全面实验,以证明我们提出的方法相对于现有基于梯度的MTL方法的优势,这亦得到了我们理论分析的验证。