Contemporary segmentation methods are usually based on deep fully convolutional networks (FCNs). However, the layer-by-layer convolutions with a growing receptive field is not good at capturing long-range contexts such as lane markers in the scene. In this paper, we address this issue by designing a distillation method that exploits label structure when training segmentation network. The intuition is that the ground-truth lane annotations themselves exhibit internal structure. We broadcast the structure hints throughout a teacher network, i.e., we train a teacher network that consumes a lane label map as input and attempts to replicate it as output. Then, the attention maps of the teacher network are adopted as supervisors of the student segmentation network. The teacher network, with label structure information embedded, knows distinctly where the convolution layers should pay visual attention into. The proposed method is named as Label-guided Attention Distillation (LGAD). It turns out that the student network learns significantly better with LGAD than when learning alone. As the teacher network is deprecated after training, our method do not increase the inference time. Note that LGAD can be easily incorporated in any lane segmentation network.
翻译:当前分割方法通常基于深度全卷积网络(FCNs)。然而,逐层卷积伴随不断增大的感受野,难以有效捕捉车道标记等场景中的长距离上下文。针对这一问题,本文设计了一种蒸馏方法,在训练分割网络时利用标签结构信息。其核心直觉在于,真实车道标注本身具有内部结构。我们将结构提示信息广播至教师网络——即训练一个以车道标签图为输入并尝试将其复现为输出的教师网络。随后,教师网络的注意力图被用作学生分割网络的监督信号。由于嵌入了标签结构信息,教师网络能明确知晓卷积层应将视觉注意力聚焦于何处。该方法被命名为标签引导的注意力蒸馏(LGAD)。实验表明,采用LGAD的学生网络学习效果显著优于单独训练时的效果。由于教师网络在训练完成后即被弃用,该方法不会增加推理时间。值得注意的是,LGAD可轻松集成至任何车道分割网络中。