Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.
翻译:先前的基于混合专家(MoE)的多任务密集预测方法取得了优异性能,但忽视了显式建模所有任务间全局关系的重要性。本文提出一种新型的解码器聚焦方法——低秩专家混合(MLoRE),用于多任务密集预测。为建模全局任务关系,MLoRE在原始MoE结构中添加了一条通用卷积路径,每个任务特征均可经由该路径实现显式参数共享。此外,为控制专家数量增加带来的参数和计算成本,我们受LoRA启发,提出在专家网络中利用普通卷积的低秩形式。由于低秩专家参数更少且可动态参数化至通用卷积中,参数和计算成本不会随专家数量增加而大幅变化。得益于这一设计,我们增加专家数量及其感受野以扩展表征能力,从而在统一网络中促进多密集任务学习。在PASCAL-Context和NYUD-v2基准上的大量实验表明,我们的MLoRE在所有指标上均优于先前最先进方法。代码地址:https://github.com/YuqiYang213/MLoRE。