Multimodal semantic segmentation is developing rapidly, but the modality of RGB-Polarization remains underexplored. To delve into this problem, we construct a UPLight RGB-P segmentation benchmark with 12 typical underwater semantic classes which provides data support for Autonomous Underwater Vehicles (AUVs) to perform special perception tasks. In this work, we design the ShareCMP, an RGB-P semantic segmentation framework with a shared dual-branch architecture, which reduces the number of parameters by about 26-33% compared to previous dual-branch models. It encompasses a Polarization Generate Attention (PGA) module designed to generate polarization modal images with richer polarization properties for the encoder. In addition, we introduce the Class Polarization-Aware Loss (CPALoss) to improve the learning and understanding of the encoder for polarization modal information and to optimize the PGA module. With extensive experiments on a total of three RGB-P benchmarks, our ShareCMP achieves state-of-the-art performance in mIoU with fewer parameters on the UPLight (92.45%), ZJU (92.7%), and MCubeS (50.99%) datasets. The code is available at https://github.com/LEFTeyex/ShareCMP.
翻译:多模态语义分割发展迅速,但RGB-偏振模态仍未被充分探索。为深入探究该问题,我们构建了包含12个典型水下语义类别的UPLight RGB-P分割基准数据集,为自主水下航行器执行特殊感知任务提供数据支持。本文设计了ShareCMP——一种共享双分支架构的RGB-P语义分割框架,其参数量较以往双分支模型减少约26-33%。该框架包含偏振生成注意力模块,可为编码器生成具有更丰富偏振特性的偏振模态图像。此外,我们引入类别偏振感知损失函数,以增强编码器对偏振模态信息的学习与理解能力,并优化PGA模块。通过在总计三个RGB-P基准数据集上的大量实验,我们的ShareCMP在UPLight(92.45%)、ZJU(92.7%)及MCubeS(50.99%)数据集上以更少参数量取得了mIoU指标的领先性能。代码已开源至https://github.com/LEFTeyex/ShareCMP。