Multi-task learning (MTL) has been widely applied in online advertising and recommender systems. To address the negative transfer issue, recent studies have proposed optimization methods that thoroughly focus on the gradient alignment of directions or magnitudes. However, since prior study has proven that both general and specific knowledge exist in the limited shared capacity, overemphasizing on gradient alignment may crowd out task-specific knowledge, and vice versa. In this paper, we propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference via Coordinated Gradient modification. We explicitly quantify the transference as loss reduction from one task to another, and then derive an auxiliary gradient from optimizing it. We perform the optimization by incorporating this gradient into original task gradients, making the model automatically maximize inter-task transfer and minimize individual losses. Thus, CoGrad can harmonize between general and specific knowledge to boost overall performance. Besides, we introduce an efficient approximation of the Hessian matrix, making CoGrad computationally efficient and simple to implement. Both offline and online experiments verify that CoGrad significantly outperforms previous methods.
翻译:多任务学习(MTL)已被广泛应用于在线广告和推荐系统中。为解决负迁移问题,近期研究提出了侧重于梯度方向或幅度对齐的优化方法。然而,由于先前研究已证明有限共享容量中同时存在通用知识与特定知识,过度强调梯度对齐可能会挤占任务特定知识,反之亦然。本文提出一种基于迁移驱动的CoGrad方法,通过协调梯度修改自适应地最大化知识迁移。我们将迁移显式量化为一个任务对另一个任务的损失减少量,并由此推导出优化该迁移量的辅助梯度。通过将该辅助梯度融入原始任务梯度进行优化,模型能够自动最大化任务间迁移并最小化各自损失。因此,CoGrad可协调通用知识与特定知识,从而提升整体性能。此外,我们引入了一种高效的Hessian矩阵近似方法,使CoGrad兼具计算高效性与实现简便性。离线和在线实验均证明,CoGrad显著优于现有方法。