Comprehensive scene understanding is a critical enabler of robot autonomy. Semantic segmentation is one of the key scene understanding tasks which is pivotal for several robotics applications including autonomous driving, domestic service robotics, last mile delivery, amongst many others. Semantic segmentation is a dense prediction task that aims to provide a scene representation in which each pixel of an image is assigned a semantic class label. Therefore, semantic segmentation considers the full scene context, incorporating the object category, location, and shape of all the scene elements, including the background. Numerous algorithms have been proposed for semantic segmentation over the years. However, the recent advances in deep learning combined with the boost in the computational capacity and the availability of large-scale labeled datasets have led to significant advances in semantic segmentation. In this chapter, we introduce the task of semantic segmentation and present the deep learning techniques that have been proposed to address this task over the years. We first define the task of semantic segmentation and contrast it with other closely related scene understanding problems. We detail different algorithms and architectures for semantic segmentation and the commonly employed loss functions. Furthermore, we present an overview of datasets, benchmarks, and metrics that are used in semantic segmentation. We conclude the chapter with a discussion of challenges and opportunities for further research in this area.
翻译:全面的场景理解是实现机器人自主性的关键基础。语义分割作为核心场景理解任务之一,在自动驾驶、家庭服务机器人、最后一公里配送等众多机器人应用领域具有举足轻重的作用。语义分割是一项密集预测任务,旨在提供一种场景表征方式,使图像中的每个像素都被赋予一个语义类别标签。因此,语义分割需考虑完整场景上下文,涵盖所有场景元素(包括背景)的物体类别、位置和形状信息。多年来,学界已提出众多语义分割算法。然而,近年来深度学习技术的突破性进展,结合计算能力的提升及大规模标注数据集的可用性,使语义分割领域取得了重大进步。本章将系统阐述语义分割任务,并介绍学界为攻克该任务而提出的深度学习技术演进历程。我们首先界定语义分割任务的定义,并将其与其他密切相关的场景理解问题进行对比分析。随后详细阐述语义分割的不同算法、网络架构及常用损失函数。此外,我们系统梳理了语义分割中使用的数据集、基准测试及评价指标。最后,通过讨论该领域的研究挑战与未来机遇作结。