Long-tailed distribution of semantic categories, which has been often ignored in conventional methods, causes unsatisfactory performance in semantic segmentation on tail categories. In this paper, we focus on the problem of long-tailed semantic segmentation. Although some long-tailed recognition methods (e.g., re-sampling/re-weighting) have been proposed in other problems, they can probably compromise crucial contextual information and are thus hardly adaptable to the problem of long-tailed semantic segmentation. To address this issue, we propose MEDOE, a novel framework for long-tailed semantic segmentation via contextual information ensemble-and-grouping. The proposed two-sage framework comprises a multi-expert decoder (MED) and a multi-expert output ensemble (MOE). Specifically, the MED includes several "experts". Based on the pixel frequency distribution, each expert takes the dataset masked according to the specific categories as input and generates contextual information self-adaptively for classification; The MOE adopts learnable decision weights for the ensemble of the experts' outputs. As a model-agnostic framework, our MEDOE can be flexibly and efficiently coupled with various popular deep neural networks (e.g., DeepLabv3+, OCRNet, and PSPNet) to improve their performance in long-tailed semantic segmentation. Experimental results show that the proposed framework outperforms the current methods on both Cityscapes and ADE20K datasets by up to 1.78% in mIoU and 5.89% in mAcc.
翻译:传统方法常忽略语义类别的长尾分布问题,导致尾类别语义分割性能不佳。本文聚焦于长尾语义分割问题。尽管其他领域已提出重采样/重加权等长尾识别方法,但这些方法可能破坏关键上下文信息,难以适配长尾语义分割问题。为此,我们提出MEDOE——一种基于上下文信息集成与分组的长尾语义分割新型框架。该两阶段框架包含多专家解码器(MED)与多专家输出集成(MOE)。具体而言,MED包含多个"专家",每个专家根据像素频率分布,以按特定类别掩码的数据集为输入,自适应生成用于分类的上下文信息;MOE采用可学习的决策权重对专家输出进行集成。作为一种模型无关框架,MEDOE可灵活高效地与主流深度神经网络(如DeepLabv3+、OCRNet、PSPNet)结合,提升其在长尾语义分割中的性能。实验结果表明,该框架在Cityscapes和ADE20K数据集上较现有方法在mIoU和mAcc指标上分别最高提升1.78%和5.89%。