LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.
翻译:激光雷达对于自动驾驶中的鲁棒三维场景感知至关重要。在相机感知之后,激光雷达感知拥有最庞大的文献体系。然而,利用激光雷达在检测、分割和运动估计等任务之间进行多任务学习仍相对未被充分探索,尤其是在车规级嵌入式平台上。我们提出了一种用于激光雷达目标检测、语义分割和运动分割的实时多任务卷积神经网络。该统一架构包含共享编码器和任务特定解码器,可实现联合表示学习。我们提出了一种新颖的语义加权与引导(SWAG)模块,用于选择性迁移语义特征以改进目标检测。我们的异构训练方案融合了不同数据集,并利用任务间的互补线索。本研究首次在嵌入式平台上实现了这些关键感知任务的统一,在NVIDIA Xavier嵌入式平台上实现了3毫秒的延迟。我们在语义分割和运动分割两项任务上取得了最先进的结果,并在三维目标检测上接近最先进性能。通过最大化硬件效率并利用多任务协同效应,我们的方法为实际自动驾驶部署提供了精确且高效的解决方案。定性结果见https://youtu.be/H-hWRzv2lIY。