In this study, we address the intricate challenge of multi-task dense prediction, encompassing tasks such as semantic segmentation, depth estimation, and surface normal estimation, particularly when dealing with partially annotated data (MTPSL). The complexity arises from the absence of complete task labels for each training image. Given the inter-related nature of these pixel-wise dense tasks, our focus is on mining and capturing cross-task relationships. Existing solutions typically rely on learning global image representations for global cross-task image matching, imposing constraints that, unfortunately, sacrifice the finer structures within the images. Attempting local matching as a remedy faces hurdles due to the lack of precise region supervision, making local alignment a challenging endeavor. The introduction of Segment Anything Model (SAM) sheds light on addressing local alignment challenges by providing free and high-quality solutions for region detection. Leveraging SAM-detected regions, the subsequent challenge lies in aligning the representations within these regions. Diverging from conventional methods that directly learn a monolithic image representation, our proposal involves modeling region-wise representations using Gaussian Distributions. Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks. This innovative approach significantly enhances our ability to effectively capture cross-task relationships, resulting in improved overall performance in partially supervised multi-task dense prediction scenarios. Extensive experiments conducted on two widely used benchmarks underscore the superior effectiveness of our proposed method, showcasing state-of-the-art performance even when compared to fully supervised methods.
翻译:本研究针对多任务密集预测中的复杂挑战,涵盖语义分割、深度估计和表面法向估计等任务,尤其关注部分标注数据场景(MTPSL)。该问题的复杂性源于每个训练图像缺失完整的任务标签。鉴于这些像素级密集任务的内在关联性,我们的研究重点在于挖掘和捕获跨任务关系。现有解决方案通常依赖于学习全局图像表征进行跨任务全局图像匹配,但此类约束不可避免地牺牲了图像的精细结构。局部匹配作为补救措施面临障碍,主要由于缺乏精确的区域监督使得局部对齐极具挑战性。Segment Anything Model(SAM)的引入为区域检测提供了免费且高质量的解决方案,有效应对局部对齐难题。利用SAM检测的区域后,核心挑战转向对齐这些区域的表征。与直接学习整体图像表征的传统方法不同,我们提出采用高斯分布对区域级表征进行建模。通过对齐不同任务对应区域之间的分布,该方法在捕获区域内部结构方面展现出更高的灵活性和能力,可兼容更广泛的任务类型。这种创新方法显著增强了有效捕获跨任务关系的能力,从而在部分监督多任务密集预测场景中实现整体性能提升。在两个广泛使用的基准数据集上进行的广泛实验表明,所提方法具有卓越有效性,即便与全监督方法相比仍展现出最优性能。