DMTG: One-Shot Differentiable Multi-Task Grouping

We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train the model weights, where the group identification often relies on heuristics. As a result, our method not only improves the training efficiency, but also mitigates the objective bias introduced by the sequential procedures that potentially lead to a suboptimal solution. Specifically, we formulate MTG as a fully differentiable pruning problem on an adaptive network architecture determined by an underlying Categorical distribution. To categorize N tasks into K groups (represented by K encoder branches), we initially set up KN task heads, where each branch connects to all N task heads to exploit the high-order task-affinity. Then, we gradually prune the KN heads down to N by learning a relaxed differentiable Categorical distribution, ensuring that each task is exclusively and uniquely categorized into only one branch. Extensive experiments on CelebA and Taskonomy datasets with detailed ablations show the promising performance and efficiency of our method. The codes are available at https://github.com/ethanygao/DMTG.

翻译：本文旨在通过多任务分组（MTG）方法解决大规模任务下的多任务学习（MTL）问题。给定N个任务，我们提出一种一次性方法，在充分挖掘高阶任务关联性的同时，从2^N个候选分组中同步识别最优任务组合并训练模型权重。该方法区别于现有先驱方法中顺序执行分组识别与权重训练的范式，后者通常依赖启发式策略进行分组识别。因此，我们的方法不仅提升了训练效率，还缓解了顺序流程可能引入的目标偏差，从而避免陷入次优解。具体而言，我们将MTG形式化为基于底层分类分布的自适应网络架构上的完全可微分剪枝问题。为将N个任务划分为K个组（对应K个编码器分支），我们初始建立KN个任务头，每个分支连接至所有N个任务头以充分挖掘高阶任务关联性。随后，通过学习松弛可微分的分类分布，逐步将KN个头剪枝至N个，确保每个任务被唯一且排他地划分至单个分支。在CelebA和Taskonomy数据集上的大量实验及详细消融研究表明，该方法在性能与效率方面均展现出显著优势。代码已开源：https://github.com/ethanygao/DMTG。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日