A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design

Semantic image and video segmentation stand among the most important tasks in computer vision nowadays, since they provide a complete and meaningful representation of the environment by means of a dense classification of the pixels in a given scene. Recently, Deep Learning, and more precisely Convolutional Neural Networks, have boosted semantic segmentation to a new level in terms of performance and generalization capabilities. However, designing Deep Semantic Segmentation models is a complex task, as it may involve application-dependent aspects. Particularly, when considering autonomous driving applications, the robustness-efficiency trade-off, as well as intrinsic limitations - computational/memory bounds and data-scarcity - and constraints - real-time inference - should be taken into consideration. In this respect, the use of additional data modalities, such as depth perception for reasoning on the geometry of a scene, and temporal cues from videos to explore redundancy and consistency, are promising directions yet not explored to their full potential in the literature. In this paper, we conduct a survey on the most relevant and recent advances in Deep Semantic Segmentation in the context of vision for autonomous vehicles, from three different perspectives: efficiency-oriented model development for real-time operation, RGB-Depth data integration (RGB-D semantic segmentation), and the use of temporal information from videos in temporally-aware models. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective, so that the reader can not only get started, but also be up to date in respect to recent advances in this exciting and challenging research field.

翻译：语义图像与视频分割是当前计算机视觉领域最重要的任务之一，因为它们通过对场景中像素的密集分类，提供对环境完整且有意义的表征。近年来，深度学习，尤其是卷积神经网络，将语义分割的性能和泛化能力提升到了新高度。然而，设计深度语义分割模型是一项复杂任务，因为它涉及依赖具体应用场景的多个方面。特别是在自动驾驶应用中，需要综合考虑鲁棒性与效率的权衡，以及固有限制（如计算/内存约束和数据稀缺性）和约束条件（如实时推理）。在此背景下，利用额外数据模态（如通过深度感知推理场景几何结构）和视频中的时间线索（探索冗余性与一致性）是文献中尚未充分挖掘的有前景方向。本文从三个不同视角对面向自动驾驶视觉应用的深度语义分割领域最新且最相关的研究进展进行了综述：面向实时运行的效率导向模型开发、RGB-深度数据融合（RGB-D语义分割），以及利用视频时间信息的时间感知模型设计。主要目标是系统讨论每个视角下的核心方法、优势、局限、成果与挑战，使读者不仅能入门该领域，还能及时跟踪这一激动人心且充满挑战的研究方向的最新进展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日