Coupling Global Context and Local Contents for Weakly-Supervised Semantic Segmentation

Thanks to the advantages of the friendly annotations and the satisfactory performance, Weakly-Supervised Semantic Segmentation (WSSS) approaches have been extensively studied. Recently, the single-stage WSSS was awakened to alleviate problems of the expensive computational costs and the complicated training procedures in multi-stage WSSS. However, results of such an immature model suffer from problems of background incompleteness and object incompleteness. We empirically find that they are caused by the insufficiency of the global object context and the lack of the local regional contents, respectively. Under these observations, we propose a single-stage WSSS model with only the image-level class label supervisions, termed as Weakly Supervised Feature Coupling Network (WS-FCN), which can capture the multi-scale context formed from the adjacent feature grids, and encode the fine-grained spatial information from the low-level features into the high-level ones. Specifically, a flexible context aggregation module is proposed to capture the global object context in different granular spaces. Besides, a semantically consistent feature fusion module is proposed in a bottom-up parameter-learnable fashion to aggregate the fine-grained local contents. Based on these two modules, WS-FCN lies in a self-supervised end-to-end training fashion. Extensive experimental results on the challenging PASCAL VOC 2012 and MS COCO 2014 demonstrate the effectiveness and efficiency of WS-FCN, which can achieve state-of-the-art results by 65.02\% and 64.22\% mIoU on PASCAL VOC 2012 val set and test set, 34.12\% mIoU on MS COCO 2014 val set, respectively. The code and weight have been released at:https://github.com/ChunyanWang1/ws-fcn.

翻译：得益于友好标注和满意性能的优势，弱监督语义分割方法已被广泛研究。最近，为解决多阶段弱监督语义分割中高昂计算代价和复杂训练流程的问题，单阶段弱监督语义分割方法被重新提出。然而，这种不成熟模型的结果存在背景不完整和对象不完整的问题。经验发现，这些问题分别源于全局对象上下文的不足和局部区域内容的缺失。基于这些观察，我们提出了一种仅依赖图像级类别标签监督的单阶段弱监督语义分割模型，称为弱监督特征耦合网络（WS-FCN）。该网络能够捕获相邻特征网格形成的多尺度上下文，并将低层特征的细粒度空间信息编码到高层特征中。具体而言，我们提出了一种灵活上下文聚合模块，用于在不同粒度空间中捕获全局对象上下文；同时提出了一种语义一致的特征融合模块，采用自底向上的参数可学习方式聚合细粒度局部内容。基于这两个模块，WS-FCN采用自监督端到端训练范式。在具有挑战性的PASCAL VOC 2012和MS COCO 2014数据集上的大量实验结果表明，WS-FCN的有效性和高效性：在PASCAL VOC 2012验证集和测试集上分别达到65.02%和64.22%的mIoU，在MS COCO 2014验证集上达到34.12%的mIoU，均取得最先进结果。相关代码和权重已开源至：https://github.com/ChunyanWang1/ws-fcn。

相关内容