Semantic segmentation serves as a cornerstone of scene understanding in autonomous driving but continues to face significant challenges under complex conditions such as occlusion. Light field and LiDAR modalities provide complementary visual and spatial cues that are beneficial for robust perception; however, their effective integration is hindered by limited viewpoint diversity and inherent modality discrepancies. To address these challenges, the first multimodal semantic segmentation dataset integrating light field data and point cloud data is proposed. Based on this dataset, we proposed a multi-modal light field point-cloud fusion segmentation network(Mlpfseg), incorporating feature completion and depth perception to segment both camera images and LiDAR point clouds simultaneously. The feature completion module addresses the density mismatch between point clouds and image pixels by performing differential reconstruction of point-cloud feature maps, enhancing the fusion of these modalities. The depth perception module improves the segmentation of occluded objects by reinforcing attention scores for better occlusion awareness. Our method outperforms image-only segmentation by 1.71 Mean Intersection over Union(mIoU) and point cloud-only segmentation by 2.38 mIoU, demonstrating its effectiveness.
翻译:语义分割作为自动驾驶场景理解的基石,在遮挡等复杂条件下仍面临重大挑战。光场与LiDAR模态提供互补的视觉与空间线索,有助于实现鲁棒感知;然而,有限的视角多样性与固有的模态差异阻碍了二者的有效融合。为应对这些挑战,本文首次提出了融合光场数据与点云数据的多模态语义分割数据集。基于该数据集,我们提出了一种多模态光场点云融合分割网络(Mlpfseg),通过特征补全与深度感知机制同时分割相机图像与LiDAR点云。特征补全模块通过对点云特征图进行差异重构,解决点云与图像像素间的密度不匹配问题,增强多模态融合效果。深度感知模块通过强化注意力分数以提升遮挡感知能力,改善对遮挡目标的分割性能。我们的方法相比纯图像分割提升了1.71%的平均交并比(mIoU),相比纯点云分割提升了2.38%的mIoU,验证了其有效性。