Semantic segmentation aims to robustly predict coherent class labels for entire regions of an image. It is a scene understanding task that powers real-world applications (e.g., autonomous navigation). One important application, the use of imagery for automated semantic understanding of pedestrian environments, provides remote mapping of accessibility features in street environments. This application (and others like it) require detailed geometric information of geographical objects. Semantic segmentation is a prerequisite for this task since it maps contiguous regions of the same class as single entities. Importantly, semantic segmentation uses like ours are not pixel-wise outcomes; however, most of their quantitative evaluation metrics (e.g., mean Intersection Over Union) are based on pixel-wise similarities to a ground-truth, which fails to emphasize over- and under-segmentation properties of a segmentation model. Here, we introduce a new metric to assess region-based over- and under-segmentation. We analyze and compare it to other metrics, demonstrating that the use of our metric lends greater explainability to semantic segmentation model performance in real-world applications.
翻译:语义分割旨在稳健地预测图像中整个区域的连贯类别标签。这是一项场景理解任务,支撑着诸多实际应用(如自主导航)。其中一个重要应用是利用图像对行人环境进行自动语义理解,从而远程绘制街道环境中的无障碍特征。该应用(及类似应用)需要地理对象的详细几何信息。语义分割是这一任务的前提,因为它将相同类别的连续区域映射为单一实体。需要强调的是,与我们的应用类似的语义分割用途并非针对像素级结果;然而,大多数定量评估指标(如平均交并比)均基于像素级与真实标注的相似度,这未能突显分割模型的过分割与欠分割特性。为此,我们提出一种评估区域级过分割与欠分割的新指标。通过分析并与其他指标对比,我们证明该指标能更显著地提升语义分割模型在实际应用中的可解释性。