RGB-T semantic segmentation has been widely adopted to handle hard scenes with poor lighting conditions by fusing different modality features of RGB and thermal images. Existing methods try to find an optimal fusion feature for segmentation, resulting in sensitivity to modality noise, class-imbalance, and modality bias. To overcome the problems, this paper proposes a novel Variational Probabilistic Fusion Network (VPFNet), which regards fusion features as random variables and obtains robust segmentation by averaging segmentation results under multiple samples of fusion features. The random samples generation of fusion features in VPFNet is realized by a novel Variational Feature Fusion Module (VFFM) designed based on variation attention. To further avoid class-imbalance and modality bias, we employ the weighted cross-entropy loss and introduce prior information of illumination and category to control the proposed VFFM. Experimental results on MFNet and PST900 datasets demonstrate that the proposed VPFNet can achieve state-of-the-art segmentation performance.
翻译:RGB-T语义分割通过融合RGB图像与热红外图像的不同模态特征,已在光照条件恶劣的复杂场景中得到广泛应用。现有方法试图寻找最优融合特征进行分割,导致对模态噪声、类别不平衡和模态偏差敏感。为解决上述问题,本文提出一种新颖的变分概率融合网络(VPFNet),该方法将融合特征视为随机变量,通过对融合特征的多重采样结果进行平均来获得鲁棒分割。VPFNet中融合特征的随机样本生成通过基于变分注意力设计的新型变分特征融合模块(VFFM)实现。为进一步避免类别不平衡和模态偏差,我们采用加权交叉熵损失,并引入光照与类别先验信息来控制所提出的VFFM。在MFNet和PST900数据集上的实验结果表明,本文提出的VPFNet能够取得最先进的分割性能。