One key bottleneck of employing state-of-the-art semantic segmentation networks in the real world is the availability of training labels. Conventional semantic segmentation networks require massive pixel-wise annotated labels to reach state-of-the-art prediction quality. Hence, several works focus on semantic segmentation networks trained with only image-level annotations. However, when scrutinizing the results of state-of-the-art in more detail, we notice that they are remarkably close to each other on average prediction quality, different approaches perform better in different classes while providing low quality in others. To address this problem, we propose a novel framework, ISLE, which employs an ensemble of the "pseudo-labels" for a given set of different semantic segmentation techniques on a class-wise level. Pseudo-labels are the pixel-wise predictions of the image-level semantic segmentation frameworks used to train the final segmentation model. Our pseudo-labels seamlessly combine the strong points of multiple segmentation techniques approaches to reach superior prediction quality. We reach up to 2.4% improvement over ISLE's individual components. An exhaustive analysis was performed to demonstrate ISLE's effectiveness over state-of-the-art frameworks for image-level semantic segmentation.
翻译:在现实世界中部署最先进语义分割网络的一个关键瓶颈是训练标签的可用性。传统语义分割网络需要大量像素级标注标签才能达到最先进的预测质量。因此,多项研究聚焦于仅使用图像级标注训练的语义分割网络。然而,更细致地审视最新研究成果时我们发现,这些网络在平均预测质量上非常接近,不同方法在特定类别上表现更优,而在其他类别上则质量较低。为应对这一问题,我们提出了一种新颖框架ISLE,该框架针对给定的一组不同语义分割技术,在类别级别上集成其“伪标签”。伪标签是用于训练最终分割模型的图像级语义分割框架生成的像素级预测。我们的伪标签无缝结合了多种分割技术的优势,从而实现了更优越的预测质量。相比于ISLE的单个组件,我们实现了高达2.4%的性能提升。通过详尽的实验分析,我们证明了ISLE在图像级语义分割领域相较于现有最先进框架的有效性。