SILOP: An Automated Framework for Semantic Segmentation Using Image Labels Based on Object Perimeters

Achieving high-quality semantic segmentation predictions using only image-level labels enables a new level of real-world applicability. Although state-of-the-art networks deliver reliable predictions, the amount of handcrafted pixel-wise annotations to enable these results are not feasible in many real-world applications. Hence, several works have already targeted this bottleneck, using classifier-based networks like Class Activation Maps~\cite{CAM} (CAMs) as a base. Addressing CAM's weaknesses of fuzzy borders and incomplete predictions, state-of-the-art approaches rely only on adding regulations to the classifier loss or using pixel-similarity-based refinement after the fact. We propose a framework that introduces an additional module using object perimeters for improved saliency. We define object perimeter information as the line separating the object and background. Our new PerimeterFit module will be applied to pre-refine the CAM predictions before using the pixel-similarity-based network. In this way, our PerimeterFit increases the quality of the CAM prediction while simultaneously improving the false negative rate. We investigated a wide range of state-of-the-art unsupervised semantic segmentation networks and edge detection techniques to create useful perimeter maps, which enable our framework to predict object locations with sharper perimeters. We achieved up to 1.5% improvement over frameworks without our PerimeterFit module. We conduct an exhaustive analysis to illustrate that SILOP enhances existing state-of-the-art frameworks for image-level-based semantic segmentation. The framework is open-source and accessible online at https://github.com/ErikOstrowski/SILOP.

翻译：仅使用图像级标签实现高质量语义分割预测，为实际应用带来了新的可能性。尽管最先进的网络能够提供可靠的预测结果，但在许多现实场景中，获取支撑这些结果所需的手工像素级标注并不现实。为此，已有研究工作针对这一瓶颈展开探索，常以类激活图（Class Activation Maps, CAMs）等基于分类器的网络为基础。针对CAM预测结果存在的边界模糊与不完整等缺陷，现有方法仅依赖于在分类器损失中增加约束项，或事后采用基于像素相似性的细化策略。我们提出一种框架，通过引入基于目标轮廓的附加模块来提升显著性。我们将目标轮廓信息定义为物体与背景的分界线。新设计的PerimeterFit模块将在使用基于像素相似性的网络之前，对CAM预测结果进行预细化。通过这种方式，PerimeterFit在提升CAM预测质量的同时改善假阴性率。我们研究了多种无监督语义分割网络和边缘检测技术，以生成有效的轮廓图，从而使我们的框架能够预测具有更清晰轮廓的目标位置。相较于未使用PerimeterFit模块的框架，我们实现了最高1.5%的性能提升。通过详尽的实验分析，我们证明了SILOP能够增强现有基于图像级标签的语义分割框架。该框架为开源代码，可通过https://github.com/ErikOstrowski/SILOP在线获取。