RGB-D semantic segmentation can be advanced with convolutional neural networks due to the availability of Depth data. Although objects cannot be easily discriminated by just the 2D appearance, with the local pixel difference and geometric patterns in Depth, they can be well separated in some cases. Considering the fixed grid kernel structure, CNNs are limited to lack the ability to capture detailed, fine-grained information and thus cannot achieve accurate pixel-level semantic segmentation. To solve this problem, we propose a Pixel Difference Convolutional Network (PDCNet) to capture detailed intrinsic patterns by aggregating both intensity and gradient information in the local range for Depth data and global range for RGB data, respectively. Precisely, PDCNet consists of a Depth branch and an RGB branch. For the Depth branch, we propose a Pixel Difference Convolution (PDC) to consider local and detailed geometric information in Depth data via aggregating both intensity and gradient information. For the RGB branch, we contribute a lightweight Cascade Large Kernel (CLK) to extend PDC, namely CPDC, to enjoy global contexts for RGB data and further boost performance. Consequently, both modal data's local and global pixel differences are seamlessly incorporated into PDCNet during the information propagation process. Experiments on two challenging benchmark datasets, i.e., NYUDv2 and SUN RGB-D reveal that our PDCNet achieves state-of-the-art performance for the semantic segmentation task.
翻译:RGB-D语义分割可借助深度数据的可用性通过卷积神经网络得到提升。虽然仅凭二维外观难以区分物体,但利用深度数据中的局部像素差与几何模式,在某些情况下可以很好地进行区分。考虑到卷积神经网络具有固定的网格卷积核结构,其缺乏捕捉细节、细粒度信息的能力,因此无法实现精确的像素级语义分割。为解决这一问题,我们提出像素差分卷积网络(PDCNet),通过分别聚合深度数据局部范围与RGB数据全局范围内的强度与梯度信息,捕捉细粒度的内在模式。具体而言,PDCNet包含深度分支与RGB分支。对于深度分支,我们提出像素差分卷积(PDC),通过聚合强度与梯度信息,考虑深度数据中局部且详细的几何信息。对于RGB分支,我们贡献了一个轻量级级联大核(CLK),将PDC扩展为CPDC,以获取RGB数据的全局上下文并进一步提升性能。最终,在信息传播过程中,两种模态数据的局部与全局像素差异被无缝融入PDCNet。在NYUDv2和SUN RGB-D这两个具有挑战性的基准数据集上的实验表明,我们的PDCNet在语义分割任务中达到了最先进性能。