Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon.
翻译:多模态融合被证明是自动驾驶中三维感知的有效方法。然而,当前用于激光雷达语义分割的大多数多模态融合管道具有复杂的融合机制。点涂色是一种相当直接的方法,它直接将激光雷达点与视觉信息结合。不幸的是,以往类似点涂色的方法受到相机与激光雷达之间投影误差的影响。在我们的实验中,我们发现这种投影误差是点涂色中的关键问题。因此,我们提出了一种深度感知的点涂色机制,显著提升了多模态融合的效果。此外,我们更深入地探讨了用于激光雷达语义分割所需的理想视觉特征。通过将视觉信息提升为线索(Lifting Visual Information as Cue),LVIC在nuScenes激光雷达语义分割基准测试中排名第一。我们的实验证明了其鲁棒性和有效性。代码将很快公开发布。