Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.
翻译:先前基于专家混合(MoE)的多任务密集预测方法虽已取得优异性能,但忽略了显式建模所有任务间全局关系的重要性。本文提出一种新颖的以解码器为中心的多任务密集预测方法,称为低秩专家混合(MLoRE)。为建模全局任务关系,MLoRE在原始MoE结构中增加了通用卷积路径,使每个任务特征均可通过该路径实现显式参数共享。此外,为控制专家数量增加带来的参数量与计算成本,我们受LoRA启发,提出在专家网络中采用标准卷积的低秩形式。由于低秩专家参数量更少,且可动态参数化到通用卷积中,专家数量的增加不会显著改变参数量与计算成本。得益于此设计,我们通过增加专家数量及其感受野来提升表示容量,促进多任务密集学习在统一网络中的实现。在PASCAL-Context与NYUD-v2基准上的大量实验表明,相较于先前最先进方法,我们的MLoRE在所有指标上均取得了更优性能。代码已开源:https://github.com/YuqiYang213/MLoRE。