Research in the field of Continual Semantic Segmentation is mainly investigating novel learning algorithms to overcome catastrophic forgetting of neural networks. Most recent publications have focused on improving learning algorithms without distinguishing effects caused by the choice of neural architecture.Therefore, we study how the choice of neural network architecture affects catastrophic forgetting in class- and domain-incremental semantic segmentation. Specifically, we compare the well-researched CNNs to recently proposed Transformers and Hybrid architectures, as well as the impact of the choice of novel normalization layers and different decoder heads. We find that traditional CNNs like ResNet have high plasticity but low stability, while transformer architectures are much more stable. When the inductive biases of CNN architectures are combined with transformers in hybrid architectures, it leads to higher plasticity and stability. The stability of these models can be explained by their ability to learn general features that are robust against distribution shifts. Experiments with different normalization layers show that Continual Normalization achieves the best trade-off in terms of adaptability and stability of the model. In the class-incremental setting, the choice of the normalization layer has much less impact. Our experiments suggest that the right choice of architecture can significantly reduce forgetting even with naive fine-tuning and confirm that for real-world applications, the architecture is an important factor in designing a continual learning model.
翻译:持续语义分割领域的研究主要致力于探索新型学习算法以克服神经网络的灾难性遗忘。最新发表的论文大多聚焦于改进学习算法,而未区分神经网络架构选择所产生的影响。为此,我们研究了神经网络架构选择如何影响类增量与域增量语义分割中的灾难性遗忘。具体而言,我们比较了研究成熟的卷积神经网络(CNN)与近期提出的Transformer和混合架构,以及新型归一化层和不同解码器头部选择的影响。研究发现,像ResNet这样的传统CNN具有高可塑性但低稳定性,而Transformer架构则稳定得多。当CNN架构的归纳偏置与Transformer在混合架构中结合时,可同时实现更高的可塑性与稳定性。这些模型的稳定性可归因于其学习对分布偏移鲁棒的通用特征的能力。不同归一化层的实验表明,持续归一化在模型适应性与稳定性之间实现了最佳权衡。在类增量设置中,归一化层选择的影响则显著减弱。我们的实验表明,即使采用朴素的微调方法,正确的架构选择也能显著减少遗忘,并证实对于实际应用而言,架构是设计持续学习模型的重要因素。