We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train the model weights, where the group identification often relies on heuristics. As a result, our method not only improves the training efficiency, but also mitigates the objective bias introduced by the sequential procedures that potentially lead to a suboptimal solution. Specifically, we formulate MTG as a fully differentiable pruning problem on an adaptive network architecture determined by an underlying Categorical distribution. To categorize N tasks into K groups (represented by K encoder branches), we initially set up KN task heads, where each branch connects to all N task heads to exploit the high-order task-affinity. Then, we gradually prune the KN heads down to N by learning a relaxed differentiable Categorical distribution, ensuring that each task is exclusively and uniquely categorized into only one branch. Extensive experiments on CelebA and Taskonomy datasets with detailed ablations show the promising performance and efficiency of our method. The codes are available at https://github.com/ethanygao/DMTG.
翻译:本文旨在通过多任务分组(MTG)方法解决大规模任务下的多任务学习(MTL)问题。给定N个任务,我们提出一种一次性方法,在充分挖掘高阶任务关联性的同时,从2^N个候选分组中同步识别最优任务组合并训练模型权重。该方法区别于现有先驱方法中顺序执行分组识别与权重训练的范式,后者通常依赖启发式策略进行分组识别。因此,我们的方法不仅提升了训练效率,还缓解了顺序流程可能引入的目标偏差,从而避免陷入次优解。具体而言,我们将MTG形式化为基于底层分类分布的自适应网络架构上的完全可微分剪枝问题。为将N个任务划分为K个组(对应K个编码器分支),我们初始建立KN个任务头,每个分支连接至所有N个任务头以充分挖掘高阶任务关联性。随后,通过学习松弛可微分的分类分布,逐步将KN个头剪枝至N个,确保每个任务被唯一且排他地划分至单个分支。在CelebA和Taskonomy数据集上的大量实验及详细消融研究表明,该方法在性能与效率方面均展现出显著优势。代码已开源:https://github.com/ethanygao/DMTG。