Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN. SCTNet utilizes a transformer as the training-only semantic branch considering its superb ability to extract long-range context. With the help of the proposed transformer-like CNN block CFBlock and the semantic information alignment module, SCTNet could capture the rich semantic information from the transformer branch in training. During the inference, only the single branch CNN needs to be deployed. We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance. The code and model is available at https://github.com/xzz777/SCTNet
翻译:近年来,实时语义分割方法通常采用额外的语义分支来获取丰富的长距离上下文信息。然而,额外分支会带来不必要的计算开销并降低推理速度。为解决这一问题,我们提出SCTNet——一种融合Transformer语义信息的单分支CNN实时分割网络。SCTNet兼具推理无关语义分支的丰富语义表征能力与轻量单分支CNN的高效性。利用Transformer在提取长距离上下文方面的卓越性能,SCTNet将其作为仅用于训练的语义分支。通过所提出的类Transformer CNN模块CFBlock与语义信息对齐模块,SCTNet在训练阶段可从Transformer分支捕获丰富的语义信息。推理时仅需部署单分支CNN网络。我们在Cityscapes、ADE20K和COCO-Stuff-10K数据集上进行了大量实验,结果表明我们的方法达到了新的最优性能。代码与模型已开源至 https://github.com/xzz777/SCTNet