Continually learning to segment more and more types of image regions is a desired capability for many intelligent systems. However, such continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning. While multiple knowledge distillation strategies originally for continual classification have been well adapted to continual semantic segmentation, they only consider transferring old knowledge based on the outputs from one or more layers of deep fully convolutional networks. Different from existing solutions, this study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements (Eg. pixels or small local regions) within each image which can capture both within-class and between-class knowledge. The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model. Considering that pixels belonging to the same class in each image often share similar visual properties, a class-specific region pooling is applied to provide more efficient relationship information for knowledge transfer. Extensive evaluations on multiple public benchmarks support that the proposed self-attention transfer method can further effectively alleviate the catastrophic forgetting issue, and its flexible combination with one or more widely adopted strategies significantly outperforms state-of-the-art solutions.
翻译:持续学习对越来越多类别的图像区域进行分割是许多智能系统期望具备的能力。然而,这种连续语义分割面临着与连续分类学习相同的灾难性遗忘问题。虽然最初用于连续分类的多种知识蒸馏策略已被很好地适配到连续语义分割中,但它们仅考虑基于深度全卷积网络单层或多层输出进行旧知识迁移。与现有解决方案不同,本研究提出迁移一种与知识相关的新型信息——即图像内部元素(如像素或小局部区域)之间的关联关系,这种关系能够同时捕获类内与类间知识。该关联信息可以有效地从Transformer风格分割模型的自注意力图中获取。考虑到每幅图像中属于同一类别的像素通常共享相似的视觉属性,我们采用类别特定区域池化方法,为知识迁移提供更高效的关联信息。在多个公开基准上的广泛评估表明,所提出的自注意力迁移方法能进一步有效缓解灾难性遗忘问题,且与一种或多种广泛采用的策略灵活结合后,其性能显著优于现有最优方案。