Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN. SCTNet utilizes a transformer as the training-only semantic branch considering its superb ability to extract long-range context. With the help of the proposed transformer-like CNN block CFBlock and the semantic information alignment module, SCTNet could capture the rich semantic information from the transformer branch in training. During the inference, only the single branch CNN needs to be deployed. We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance. The code and model is available at https://github.com/xzz777/SCTNet
翻译:近期实时语义分割方法通常采用额外的语义分支以获取丰富的长距离上下文信息。然而,该分支会引入不必要的计算开销并降低推理速度。为解决此问题,我们提出SCTNet——一种融合Transformer语义信息的单分支CNN实时分割网络。SCTNet兼具无推理开销语义分支的丰富语义表征能力与轻量级单分支CNN的高效性。考虑到Transformer在提取长距离上下文方面的卓越性能,我们将其作为仅用于训练的语义分支。借助所提出的类Transformer CNN模块CFBlock与语义信息对齐模块,SCTNet可在训练阶段捕获Transformer分支的丰富语义信息,推理时仅需部署单分支CNN。我们在Cityscapes、ADE20K和COCO-Stuff-10K数据集上开展了广泛实验,结果表明所提方法实现了新的最优性能。相关代码与模型已发布于https://github.com/xzz777/SCTNet。