A key bottleneck of employing state-of-the-art semantic segmentation networks in the real world is the availability of training labels. Standard semantic segmentation networks require massive pixel-wise annotated labels to reach state-of-the-art prediction quality. Hence, several works focus on semantic segmentation networks trained with only image-level annotations. However, when scrutinizing the state-of-the-art results in more detail, we notice that although they are very close to each other on average prediction quality, different approaches perform better in different classes while providing low quality in others. To address this problem, we propose a novel framework, AutoEnsemble, which employs an ensemble of the "pseudo-labels" for a given set of different segmentation techniques on a class-wise level. Pseudo-labels are the pixel-wise predictions of the image-level semantic segmentation frameworks used to train the final segmentation model. Our pseudo-labels seamlessly combine the strong points of multiple segmentation techniques approaches to reach superior prediction quality. We reach up to 2.4% improvement over AutoEnsemble's components. An exhaustive analysis was performed to demonstrate AutoEnsemble's effectiveness over state-of-the-art frameworks for image-level semantic segmentation.
翻译:当前最先进的语义分割网络在实际部署中的关键瓶颈在于训练标签的可用性。标准语义分割网络需要海量像素级标注标签才能达到最先进的预测质量。因此,众多研究聚焦于仅使用图像级标注训练的语义分割网络。然而,更细致地审视当前最优结果时,我们发现:尽管这些方法在平均预测质量上非常接近,但不同方法在特定类别上表现更优,而在其他类别上表现欠佳。为解决该问题,我们提出新型框架AutoEnsemble——针对给定多种分割技术集合,在类别层面集成其"伪标签"。伪标签是指用于训练最终分割模型的图像级语义分割框架所产生的像素级预测结果。我们的伪标签能无缝融合多种分割技术方法的优势,从而获得更优的预测质量。相比AutoEnsemble的各组件,我们实现了最高2.4%的性能提升。通过全面分析,我们验证了AutoEnsemble相比当前最先进的图像级语义分割框架的有效性。