Weakly Supervised Semantic Segmentation (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. Despite the impressive achievement in recent WSSS methods, we identify that introducing weak labels with high mean Intersection of Union (mIoU) does not guarantee high segmentation performance. Existing studies have emphasized the importance of prioritizing precision and reducing noise to improve overall performance. In the same vein, we propose ORANDNet, an advanced ensemble approach tailored for WSSS. ORANDNet combines Class Activation Maps (CAMs) from two different classifiers to increase the precision of pseudo-masks (PMs). To further mitigate small noise in the PMs, we incorporate curriculum learning. This involves training the segmentation model initially with pairs of smaller-sized images and corresponding PMs, gradually transitioning to the original-sized pairs. By combining the original CAMs of ResNet-50 and ViT, we significantly improve the segmentation performance over the single-best model and the naive ensemble model, respectively. We further extend our ensemble method to CAMs from AMN (ResNet-like) and MCTformer (ViT-like) models, achieving performance benefits in advanced WSSS models. It highlights the potential of our ORANDNet as a final add-on module for WSSS models.
翻译:弱监督语义分割(WSSS)利用图像级标签等弱监督信息来训练分割模型。尽管近期的WSSS方法取得了显著进展,但我们发现引入具有高平均交并比(mIoU)的弱标签并不能保证获得高的分割性能。现有研究强调了优先考虑精度并降低噪声对提升整体性能的重要性。基于相同思路,我们提出了ORANDNet,一种专为WSSS设计的先进集成方法。ORANDNet通过结合来自两个不同分类器的类激活图(CAMs)来提高伪掩码(PMs)的精度。为了进一步抑制PMs中的细微噪声,我们引入了课程学习策略。该策略首先使用较小尺寸的图像及其对应PMs对分割模型进行训练,随后逐步过渡到原始尺寸的图像-PM对进行训练。通过集成ResNet-50与ViT的原始CAMs,我们的方法相较于单一最佳模型及朴素集成模型,均显著提升了分割性能。我们进一步将该集成方法扩展应用于AMN(类ResNet架构)与MCTformer(类ViT架构)模型生成的CAMs,在先进的WSSS模型中实现了性能增益。这凸显了ORANDNet作为WSSS模型最终附加模块的潜力。