The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the Graph Transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object's edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.
翻译:基于Transformer的语义分割方法通过滑动窗口将图像划分为不同区域,并建模每个窗口内的像素关系,取得了显著成功。然而,由于以往工作中窗口间关系建模未被充分关注,这一潜力尚未完全挖掘。针对该问题,我们提出Graph-Segmenter,包含图Transformer与边界感知注意力模块。该网络能同时从全局视角建模窗口间更深层关系,从局部视角建模每个窗口内不同像素间关系,并以低成本实现高效的边界调整。具体而言,我们将每个窗口及窗口内像素视为节点,构建两种视角下的图结构,并设计图Transformer。引入的边界感知注意力模块通过建模目标物体边缘像素间的关系,优化物体的边缘信息。在三大广泛使用的语义分割数据集(Cityscapes、ADE-20k和PASCAL Context)上的大量实验表明,我们提出的基于边界感知注意力的图Transformer网络能够实现最先进的分割性能。