Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications. Recent works rely on well-crafted lightweight models to achieve fast inference. However, these models cannot flexibly adapt to varying accuracy and efficiency requirements. In this paper, we propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference depending on the desired accuracy-efficiency tradeoff. More specifically, we employ parametrized channel slimming by stepwise downward knowledge distillation during training. Motivated by the observation that the differences between segmentation results of each submodel are mainly near the semantic borders, we introduce an additional boundary guided semantic segmentation loss to further improve the performance of each submodel. We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance than independent models. Extensive experiments on semantic segmentation benchmarks, Cityscapes and CamVid, demonstrate the generalization ability of our framework.
翻译:精确的语义分割模型通常需要大量计算资源,限制了其在实际应用中的使用。近期研究依赖于精心设计的轻量级模型以实现快速推理,但这些模型无法灵活适应不同的精度与效率需求。本文提出一种简单但有效的可伸缩语义分割方法(SlimSeg),可根据期望的精度-效率权衡,在推理过程中以不同容量执行。具体而言,我们在训练阶段通过逐步向下知识蒸馏实现参数化通道缩减。基于观察到各子模型分割结果的差异主要集中于语义边界附近这一现象,我们引入额外的边界引导语义分割损失,进一步优化各子模型的性能。实验表明,我们的SlimSeg结合多种主流网络可生成灵活模型,能动态调整计算成本,且性能优于独立模型。在Cityscapes与CamVid语义分割基准上的大量实验验证了本框架的泛化能力。