Semantic Image Segmentation: Two Decades of Research

from arxiv, Pre-print of the book: G. Csurka, R. Volpi and B. Chidlovski: Semantic Image Segmentation: Two Decades of Research, FTCGV (14): No. 1-2, http://dx.doi.org/10.1561/0600000095. The authors retained the copyright and are allowed to post it on arXiv. Research only use, commercial use or systematic downloading (by robots or other automatic processes) is prohibited

Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning. State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.

翻译：语义图像分割（SiS）在广泛的计算机视觉应用中发挥着基础性作用，为图像的全局理解提供关键信息。本综述旨在总结SiS领域二十年的研究成果，我们提出从早期历史方法开始，随后概述包括最新Transformer趋势在内的近期深度学习方法，并对相关文献进行了回顾。我们通过讨论弱监督和辅助机器学习技术的特例来补充回顾，这些技术可用于改进语义分割，例如课程学习、增量学习或自监督学习。最先进的SiS模型依赖大量标注样本，而获取这些样本的成本远高于图像分类等任务的标签。由于未标注数据获取成本显著更低，因此无监督域适应（UDA）在语义分割领域取得广泛成功并不令人意外。因此，本书的第二个核心贡献是总结过去五年中快速发展的领域——语义图像分割域适应（DASiS），该领域涵盖了语义分割本身的重要性以及将分割模型适配到新环境的迫切需求。除了对DASiS技术进行综述外，我们还揭示了更新的趋势，例如多域学习、域泛化、域增量学习、测试时适应和无源域适应。最后，我们通过描述SiS和DASiS中最广泛使用的数据集和基准来结束本综述，并简要讨论相关任务（如实例和全景图像分割）以及应用（如医学图像分割）。