LiDAR-based 3D object detection and panoptic segmentation are two crucial tasks in the perception systems of autonomous vehicles and robots. In this paper, we propose All-in-One Perception Network (AOP-Net), a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation. In this method, a dual-task 3D backbone is developed to extract both panoptic- and detection-level features from the input LiDAR point cloud. Also, a new 2D backbone that intertwines Multi-Layer Perceptron (MLP) and convolution layers is designed to further improve the detection task performance. Finally, a novel module is proposed to guide the detection head by recovering useful features discarded during down-sampling operations in the 3D backbone. This module leverages estimated instance segmentation masks to recover detailed information from each candidate object. The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks. Also, experiments show that our method easily adapts to and significantly improves the performance of any BEV-based 3D object detection method.
翻译:激光雷达三维目标检测与全景分割是自动驾驶车辆与机器人感知系统中的两项关键任务。本文提出一体化感知网络(AOP-Net),一种基于激光雷达的多任务框架,该框架结合了三维目标检测与全景分割。在该方法中,设计了一个双任务三维骨干网络,用于从输入的激光雷达点云中提取全景与检测级别的特征。此外,还设计了一个融合多层感知机(MLP)与卷积层的新型二维骨干网络,以进一步提升检测任务的性能。最后,提出了一种新颖模块,通过恢复三维骨干网络中下采样操作所丢弃的有用特征来引导检测头。该模块利用估计的实例分割掩码,从每个候选对象中恢复细节信息。AOP-Net在nuScenes基准测试中对已发表工作进行对比,在三维目标检测与全景分割两项任务上均达到了最先进的性能。此外,实验表明,该方法能够轻松适配并显著提升任何基于BEV的三维目标检测方法的性能。