As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Running a dedicated model for each task is computationally expensive and therefore there is a great interest in multi-task learning (MTL). MTL aims at learning a single model that solves several tasks efficiently. Optimizing MTL models is often achieved by computing a single gradient per task and aggregating them for obtaining a combined update direction. However, these approaches do not consider an important aspect, the sensitivity in the gradient dimensions. Here, we introduce a novel gradient aggregation approach using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This additional valuable information allows us to quantify the uncertainty in each of the gradients dimensions, which can then be factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.
翻译:随着机器学习日益普及,并行执行多项推理任务的需求不断增长。为每个任务运行专用模型计算成本高昂,因此多任务学习(MTL)引起了广泛关注。MTL旨在学习一个能够高效解决多个任务的单一模型。优化MTL模型通常通过计算每个任务的单一梯度并聚合它们以获得组合更新方向来实现。然而,这些方法未考虑一个重要方面:梯度维度的敏感性。本文提出一种基于贝叶斯推理的新型梯度聚合方法。我们在任务特定参数上放置概率分布,进而诱导出任务梯度的分布。这一额外有价值的信息使我们能够量化每个梯度维度的不确定性,并在聚合时将其纳入考量。我们在多种数据集上实证展示了该方法优势,取得了最先进的性能表现。