Recently, MBConv blocks, initially designed for efficiency in resource-limited settings and later adapted for cutting-edge image classification performances, have demonstrated significant potential in image classification tasks. Despite their success, their application in semantic segmentation has remained relatively unexplored. This paper introduces a novel adaptation of MBConv blocks specifically tailored for semantic segmentation. Our modification stems from the insight that semantic segmentation requires the extraction of more detailed spatial information than image classification. We argue that to effectively perform multi-scale semantic segmentation, each branch of a U-Net architecture, regardless of its resolution, should possess equivalent segmentation capabilities. By implementing these changes, our approach achieves impressive mean Intersection over Union (IoU) scores of 84.5% and 84.0% on the Cityscapes test and validation datasets, respectively, demonstrating the efficacy of our proposed modifications in enhancing semantic segmentation performance.
翻译:近期,最初为资源受限场景下的高效率而设计、后经调整用于尖端图像分类性能的MBConv块,已在图像分类任务中展现出显著潜力。尽管取得了成功,但其在语义分割中的应用仍相对未充分探索。本文提出了一种专门针对语义分割定制的MBConv块新型改进方案。我们的改进源于以下见解:语义分割需要比图像分类提取更精细的空间信息。我们认为,为有效执行多尺度语义分割,U-Net架构的每个分支(无论其分辨率如何)都应具备同等的分割能力。通过实施这些改变,我们的方法在Cityscapes测试集和验证集上分别取得了84.5%和84.0%的平均交并比(IoU)分数,证明了所提改进方案在提升语义分割性能方面的有效性。