Segmentation of drivable roads and negative obstacles is critical to the safe driving of autonomous vehicles. Currently, many multi-modal fusion methods have been proposed to improve segmentation accuracy, such as fusing RGB and depth images. However, we find that when fusing two modals of data with untrustworthy features, the performance of multi-modal networks could be degraded, even lower than those using a single modality. In this paper, the untrustworthy features refer to those extracted from regions (e.g., far objects that are beyond the depth measurement range) with invalid depth data (i.e., 0 pixel value) in depth images. The untrustworthy features can confuse the segmentation results, and hence lead to inferior results. To provide a solution to this issue, we propose the Adaptive-Mask Fusion Network (AMFNet) by introducing adaptive-weight masks in the fusion module to fuse features from RGB and depth images with inconsistency. In addition, we release a large-scale RGB-depth dataset with manually-labeled ground truth based on the NPO dataset for drivable roads and negative obstacles segmentation. Extensive experimental results demonstrate that our network achieves state-of-the-art performance compared with other networks. Our code and dataset are available at: https://github.com/lab-sun/AMFNet.
翻译:可行驶道路与负障碍物的分割对自动驾驶车辆的安全行驶至关重要。当前,许多多模态融合方法(例如融合RGB图像与深度图像)已被提出以提高分割精度。然而,我们发现当融合具有不可信特征的双模态数据时,多模态网络的性能可能下降,甚至低于使用单一模态的网络。本文中,不可信特征指从深度图像中存在无效深度数据(即像素值为0)的区域(如超出深度测量范围的远处物体)提取的特征。这些不可信特征会混淆分割结果,从而导致性能劣化。为解决此问题,我们提出自适应掩膜融合网络(AMFNet),通过在融合模块中引入自适应权重掩膜,实现对RGB与深度图像中不一致特征的融合。此外,我们基于NPO数据集发布了大规模RGB-深度数据集及其人工标注的真值,用于可行驶道路与负障碍物分割。大量实验结果表明,相较于其他网络,我们的网络取得了最先进的性能。代码与数据集已开源于:https://github.com/lab-sun/AMFNet。