High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model capable of directly producing high-resolution segmentations can match the performance of more complex systems that generate lower-resolution results. By simplifying the network architecture, we enable the processing of images at their native resolution. Our approach leverages a bottom-up information propagation technique across various scales, which we have empirically shown to enhance segmentation accuracy. We have rigorously tested our method using leading-edge semantic segmentation datasets. Specifically, for the Cityscapes dataset, we further boost accuracy by applying the Noisy Student Training technique.
翻译:高分辨率语义分割需要大量的计算资源。传统方法通常先对输入图像进行降采样处理,再通过上采样将低分辨率输出恢复至原始尺寸。尽管该策略能有效识别大范围区域,但往往难以捕捉精细细节。本研究表明,一种能够直接生成高分辨率分割结果的精简模型,其性能可与生成低分辨率输出的复杂系统相媲美。通过简化网络架构,我们得以直接在图像原始分辨率下进行处理。本方法采用跨多尺度的自底向上信息传播技术,实证表明该技术能够提升分割精度。我们使用前沿语义分割数据集对所提方法进行了严格测试。具体而言,在Cityscapes数据集上,我们通过应用Noisy Student Training技术进一步提升了准确率。