Within a perception framework for autonomous mobile and robotic systems, semantic analysis of 3D point clouds typically generated by LiDARs is key to numerous applications, such as object detection and recognition, and scene reconstruction. Scene semantic segmentation can be achieved by directly integrating 3D spatial data with specialized deep neural networks. Although this type of data provides rich geometric information regarding the surrounding environment, it also presents numerous challenges: its unstructured and sparse nature, its unpredictable size, and its demanding computational requirements. These characteristics hinder the real-time semantic analysis, particularly on resource-constrained hardware architectures that constitute the main computational components of numerous robotic applications. Therefore, in this paper, we investigate various 3D semantic segmentation methodologies and analyze their performance and capabilities for resource-constrained inference on embedded NVIDIA Jetson platforms. We evaluate them for a fair comparison through a standardized training protocol and data augmentations, providing benchmark results on the Jetson AGX Orin and AGX Xavier series for two large-scale outdoor datasets: SemanticKITTI and nuScenes.
翻译:在自动驾驶移动与机器人系统的感知框架中,对激光雷达(LiDAR)生成的三维点云进行语义分析,是实现目标检测识别与场景重建等众多应用的关键。通过将三维空间数据与专用深度神经网络直接结合,可实现场景语义分割。尽管此类数据提供了丰富的环境几何信息,但其非结构化与稀疏特性、不可预测的数据规模及严苛的计算需求也带来了诸多挑战。这些特性阻碍了实时语义分析的实现,特别是在资源受限的硬件架构上——此类架构正是众多机器人应用的核心计算单元。为此,本文系统研究了多种三维语义分割方法,并分析了其在嵌入式NVIDIA Jetson平台上进行资源受限推理的性能与能力。我们通过标准化的训练协议与数据增强策略对各类方法进行公平比较,在Jetson AGX Orin与AGX Xavier系列硬件上,针对SemanticKITTI和nuScenes两大户外大规模数据集提供了基准测试结果。