Semantic segmentation is an important task for many applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstruction and pyramid context extraction. It captures the global context in both horizontal and vertical directions and gets the axial global context to explicitly model rectangular key areas. A shape self-calibration function is designed to make the key areas more close to the foreground object. Besides, a lightweight Dynamic Prototype Guided head is proposed to improve the classification of foreground objects by explicit class embedding. Our CGRSeg is extensively evaluated on ADE20K, COCO-Stuff, and Pascal Context benchmarks, and achieves state-of-the-art semantic performance. Specifically, it achieves $43.6\%$ mIoU on ADE20K with only $4.0$ GFLOPs, which is $0.9\%$ and $2.5\%$ mIoU better than SeaFormer and SegNeXt but with about $38.0\%$ fewer GFLOPs. Code is available at https://github.com/nizhenliang/CGRSeg.
翻译:语义分割是许多应用中的重要任务,但在有限计算成本下实现先进性能仍颇具挑战。本文提出CGRSeg,一个基于上下文引导的空间特征重建的高效且具有竞争力的分割框架。我们精心设计了矩形自校准模块,用于空间特征重建和金字塔上下文提取。该模块在水平和垂直方向捕获全局上下文,获取轴向全局上下文以显式建模矩形关键区域。通过设计形状自校准函数,使关键区域更贴近前景目标。此外,我们提出了轻量级动态原型引导头,通过显式类嵌入提升前景目标分类性能。CGRSeg在ADE20K、COCO-Stuff和Pascal Context基准上进行了广泛评估,取得了最先进的语义分割性能。具体而言,在ADE20K上仅需4.0 GFLOPs即可达到43.6%的mIoU,比SeaFormer和SegNeXt分别高出0.9%和2.5%的mIoU,且计算量降低约38.0%。代码已开源在https://github.com/nizhenliang/CGRSeg。