Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
翻译:任务条件架构在参数效率方面具有优势,但与最先进的多解码器方法相比,性能有所不足。如何在性能与模型参数之间取得平衡是一个重要且困难的问题。本文提出了一种简单且轻量的任务条件模型,称为提示引导的Transformer(PGT),以优化这一挑战。我们的方法设计了一个提示条件Transformer块,在自注意力机制中融入任务特定提示,从而实现跨多个任务的全局依赖建模和参数高效的特征适应。该块被集成到共享编码器和解码器中,增强了任务内和任务间特征的捕获能力。此外,我们设计了一个轻量解码器以进一步减少参数使用,该解码器仅占总模型参数的2.7%。在两个多任务密集预测基准数据集PASCAL-Context和NYUD-v2上的广泛实验表明,我们的方法在使用更少参数的情况下,达到了任务条件方法中的最先进结果,并在性能与参数规模之间保持了显著的平衡。