Modern semantic segmentation methods devote much effect to adjusting image feature representations to improve the segmentation performance in various ways, such as architecture design, attention mechnism, etc. However, almost all those methods neglect the particularity of class weights (in the classification layer) in segmentation models. In this paper, we notice that the class weights of categories that tend to share many adjacent boundary pixels lack discrimination, thereby limiting the performance. We call this issue Boundary-caused Class Weights Confusion (BCWC). We try to focus on this problem and propose a novel method named Embedded Conditional Random Field (E-CRF) to alleviate it. E-CRF innovatively fuses the CRF into the CNN network as an organic whole for more effective end-to-end optimization. The reasons are two folds. It utilizes CRF to guide the message passing between pixels in high-level features to purify the feature representation of boundary pixels, with the help of inner pixels belonging to the same object. More importantly, it enables optimizing class weights from both scale and direction during backpropagation. We make detailed theoretical analysis to prove it. Besides, superpixel is integrated into E-CRF and served as an auxiliary to exploit the local object prior for more reliable message passing. Finally, our proposed method yields impressive results on ADE20K, Cityscapes, and Pascal Context datasets.
翻译:现代语义分割方法在调整图像特征表示方面投入了大量精力,通过架构设计、注意力机制等多种方式提升分割性能。然而,几乎所有方法都忽略了分割模型中分类层类权重的特殊性。本文发现,倾向于共享大量相邻边界像素的类别的类权重缺乏区分性,从而限制了性能。我们将此问题称为边界引起的类权重混淆(BCWC)。针对该问题,我们提出了一种名为嵌入式条件随机场(E-CRF)的新方法加以缓解。E-CRF创新性地将CRF作为有机整体融合到CNN网络中,以实现更有效的端到端优化,其原因有二:其一,借助属于同一对象的内部像素,利用CRF引导高层特征中像素间的信息传递,以净化边界像素的特征表示;其二,更重要的是,它能够在反向传播过程中从尺度和方向两个维度优化类权重。我们通过详细的理论分析证明了这一点。此外,将超像素集成到E-CRF中,作为利用局部对象先验进行更可靠信息传递的辅助手段。最终,我们的方法在ADE20K、Cityscapes和Pascal Context数据集上取得了令人瞩目的结果。