Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models, but also enhances task accuracy through the interaction of multiple tasks. However, existing methods face limitations. They often rely on suboptimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. In addition, many approaches use inadequate loss weighting strategies, which do not address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages state-of-the-art vision transformers with task-specific decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets. The code is available at:https://github.com/Klodivio355/MT-CP
翻译:多任务学习(MTL)涉及同时训练多个任务,为计算机视觉中的密集预测任务提供了显著优势。相较于使用多个单任务模型,MTL不仅减少了训练和推理时间,还通过多任务间的交互提升了任务精度。然而,现有方法存在局限性。它们通常依赖于次优的跨任务交互,导致任务特定预测在几何和预测一致性方面表现不佳。此外,许多方法采用不充分的损失加权策略,未能解决训练过程中任务演化固有的变异性问题。为克服这些挑战,我们提出了一种专为密集视觉任务设计的先进MTL模型。我们的模型利用最先进的视觉Transformer与任务特定解码器。为增强跨任务一致性,我们引入了一种回溯方法,以改进跨任务的几何特征与预测特征。此外,我们提出了一种新颖的动态任务平衡方法,将任务损失投影到统一尺度上,并在训练过程中优先处理更具挑战性的任务。大量实验证明了我们方法的优越性,在两个基准数据集上均取得了最新的最优性能。代码发布于:https://github.com/Klodivio355/MT-CP