AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.

翻译：多任务学习（MTL）模型在计算机视觉、自然语言处理和推荐系统中展现了显著成果。尽管已有多种方法被提出，但这些方法如何在每个参数上平衡不同任务仍不明确。本文提出通过各任务对参数的更新总量来衡量参数的任务主导程度。具体而言，我们通过任务对应的参数平方更新指数衰减平均值（AU）计算总更新量。基于这一新型度量，我们发现现有MTL方法中的许多参数（尤其是高层共享层中的参数）仍然被一个或多个任务主导。AU的主导性主要源于一个或多个任务累积梯度的主导性。受此启发，我们提出一种任务感知的自适应学习率方法（简称AdaTask），通过分离自适应学习率方法（如AdaGrad、RMSProp和Adam）中每个任务对每个参数的累积梯度及其学习率。在计算机视觉和推荐系统MTL数据集上的综合实验表明，AdaTask显著提升了主导任务的表现，实现了最先进的任务平均性能。对合成数据集和真实数据集的进一步分析证明，AdaTask能够有效平衡每个共享层中的参数分布。

相关内容

自适应学习

关注 10

自适应学习，也被称为自适应教学，是使用计算机算法来协调与学习者的互动，并提供定制学习资源和学习活动来解决每个学习者的独特需求的教育方法。在专业的学习情境，个人可以“试验出”一些训练方式，以确保教学内容的更新。根据学生的学习需要，计算机生成适应其特点的教育材料，包括他们对问题的回答和完成的任务和经验。该技术涵盖了各个研究领域和它们的衍生，包括计算机科学、人工智能、心理测验、教育学、心理学和脑科学。

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日