Semantic segmentation is an essential technology for self-driving cars to comprehend their surroundings. Currently, real-time semantic segmentation networks commonly employ either encoder-decoder architecture or two-pathway architecture. Generally speaking, encoder-decoder models tend to be quicker,whereas two-pathway models exhibit higher accuracy. To leverage both strengths, we present the Spatial-Assistant Encoder-Decoder Network (SANet) to fuse the two architectures. In the overall architecture, we uphold the encoder-decoder design while maintaining the feature maps in the middle section of the encoder and utilizing atrous convolution branches for same-resolution feature extraction. Toward the end of the encoder, we integrate the asymmetric pooling pyramid pooling module (APPPM) to optimize the semantic extraction of the feature maps. This module incorporates asymmetric pooling layers that extract features at multiple resolutions. In the decoder, we present a hybrid attention module, SAD, that integrates horizontal and vertical attention to facilitate the combination of various branches. To ascertain the effectiveness of our approach, our SANet model achieved competitive results on the real-time CamVid and cityscape datasets. By employing a single 2080Ti GPU, SANet achieved a 78.4 % mIOU at 65.1 FPS on the Cityscape test dataset and 78.8 % mIOU at 147 FPS on the CamVid test dataset. The training code and model for SANet are available at https://github.com/CuZaoo/SANet-main
翻译:语义分割是自动驾驶汽车理解周围环境的关键技术。目前,实时语义分割网络普遍采用编码器-解码器架构或双路径架构。总体而言,编码器-解码器模型速度更快,而双路径模型精度更高。为融合两者优势,我们提出空间辅助编码器-解码器网络(SANet)以结合这两种架构。在整体架构中,我们保留编码器-解码器设计,同时维持编码器中间部分的特征图,并利用空洞卷积分支进行同分辨率特征提取。在编码器末端,我们集成非对称池化金字塔池化模块(APPPM)以优化特征图的语义提取,该模块包含多个分辨率的非对称池化层。在解码器中,我们提出混合注意力模块SAD,该模块整合水平和垂直注意力以促进多分支融合。为验证方法有效性,我们的SANet模型在实时CamVid和Cityscapes数据集上取得了具有竞争力的结果。在单个2080Ti GPU上,SANet在Cityscapes测试数据集上达到78.4% mIOU(65.1 FPS),在CamVid测试数据集上达到78.8% mIOU(147 FPS)。SANet的训练代码和模型已开源:https://github.com/CuZaoo/SANet-main