State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to driving scene datasets. Based on extensive analysis of dataset characteristics, we employ Contrastive Language-Image Pre-training (CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from CLIP lack in representing small object classes, and (2) these masks contain notable noise. We propose solutions for each issue as follows. (1) We devise Global-Local View Training that seamlessly incorporates small-scale patches during model training, thereby enhancing the model's capability to handle small-sized yet critical objects in driving scenes (e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing (CARB), a novel technique that discerns reliable and noisy regions through evaluating the consistency between CLIP masks and segmentation predictions. It prioritizes reliable pixels over noisy pixels via adaptive loss weighting. Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test dataset, showcasing its potential as a strong WSSS baseline on driving scene datasets. Experimental results on CamVid and WildDash2 demonstrate the effectiveness of our method across diverse datasets, even with small-scale datasets or visually challenging conditions. The code is available at https://github.com/k0u-id/CARB.
翻译:基于图像级标签的弱监督语义分割(WSSS)技术在Cityscapes等驾驶场景数据集上表现出严重的性能退化。针对这一挑战,我们开发了一种专为驾驶场景数据集设计的全新WSSS框架。基于对数据集特性的深入分析,我们采用对比语言-图像预训练(CLIP)作为基线来获取伪掩码。然而,CLIP带来了两个关键难题:(1)CLIP生成的伪掩码对小型物体类别的表征能力不足;(2)这些掩码包含显著噪声。我们针对每个问题提出如下解决方案:(1)设计全局-局部视角训练方法,在模型训练过程中无缝集成小尺度图像块,从而增强模型处理驾驶场景中小型但关键物体(如交通信号灯)的能力;(2)引入一致性感知区域平衡(CARB)技术,通过评估CLIP掩码与分割预测之间的一致性来区分可靠区域与噪声区域,并基于自适应损失权重优先处理可靠像素。值得注意的是,该方法在Cityscapes测试数据集上实现了51.8%的mIoU,展现了其作为驾驶场景数据集上强效WSSS基线的潜力。在CamVid和WildDash2上的实验结果表明,即使在小型数据集或视觉挑战条件下,该方法在多种数据集上均具有有效性。代码开源地址:https://github.com/k0u-id/CARB。