Multi-task learning has proven to be effective in improving the performance of correlated tasks. Most of the existing methods use a backbone to extract initial features with independent branches for each task, and the exchange of information between the branches usually occurs through the concatenation or sum of the feature maps of the branches. However, this type of information exchange does not directly consider the local characteristics of the image nor the level of importance or correlation between the tasks. In this paper, we propose a semantic segmentation method, MTLSegFormer, which combines multi-task learning and attention mechanisms. After the backbone feature extraction, two feature maps are learned for each task. The first map is proposed to learn features related to its task, while the second map is obtained by applying learned visual attention to locally re-weigh the feature maps of the other tasks. In this way, weights are assigned to local regions of the image of other tasks that have greater importance for the specific task. Finally, the two maps are combined and used to solve a task. We tested the performance in two challenging problems with correlated tasks and observed a significant improvement in accuracy, mainly in tasks with high dependence on the others.
翻译:多任务学习已被证明能有效提升相关任务的性能。现有方法大多采用骨干网络提取初始特征,并为每个任务设置独立分支,分支间的信息交换通常通过特征图的拼接或求和实现。然而,这种信息交换方式既未直接考虑图像的局部特征,也未考虑任务间的重要性或相关性程度。本文提出一种语义分割方法MTLSegFormer,它融合了多任务学习与注意力机制。在骨干网络特征提取后,为每个任务学习两个特征图:第一个特征图旨在学习与其任务相关的特征,第二个特征图则通过应用可学习的视觉注意力,对其他任务的特征图进行局部权重重分配。通过这种方式,为其他任务图像中对该特定任务更具重要性的局部区域赋予权重。最后,将两个特征图合并用于解决该任务。我们在两个具有挑战性的相关任务问题上测试了性能,观察到准确率的显著提升,尤其是在对其他任务依赖性较高的任务上。