Deep neural networks have exhibited remarkable performance in a variety of computer vision fields, especially in semantic segmentation tasks. Their success is often attributed to multi-level feature fusion, which enables them to understand both global and local information from an image. However, we found that multi-level features from parallel branches are on different scales. The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent, thereby degrading performance in semantic segmentation. We discover that scale disequilibrium is caused by bilinear upsampling, which is supported by both theoretical and empirical evidence. Based on this observation, we propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling. Our proposed scale equalizers are easy to implement, applicable to any architecture, hyperparameter-free, implementable without requiring extra computational cost, and guarantee scale equilibrium for any dataset. Experiments showed that adopting scale equalizers consistently improved the mIoU index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPerHead, PSPHead, ASPPHead, SepASPPHead, and FCNHead.
翻译:深度神经网络在多种计算机视觉任务中表现出卓越性能,尤其在语义分割领域。其成功常归因于多层级特征融合,使网络能够同时理解图像的全局与局部信息。然而,我们发现并行分支的多层级特征存在尺度差异。这种尺度不均衡是一种普遍且有害的缺陷,会导致梯度下降过程受损,进而降低语义分割性能。通过理论与实证分析,我们证明尺度不均衡由双线性上采样引起。基于此发现,我们提出在双线性上采样后注入尺度均衡器,以实现多层级特征的尺度均衡。所提出的尺度均衡器易于实现、适用于任意架构、无需超参数、不增加额外计算成本,并且能够保证任意数据集的尺度均衡性。实验表明,采用尺度均衡器后,在ADE20K、PASCAL VOC 2012、Cityscapes等多个目标数据集,以及UPerHead、PSPHead、ASPPHead、SepASPPHead、FCNHead等多种解码器选择中,mIoU指标均获得持续提升。