Domain generalized semantic segmentation (DGSS) is a critical yet challenging task, where the model is trained only on source data without access to any target data. Despite the proposal of numerous DGSS strategies, the generalization capability remains limited in CNN architectures. Though some Transformer-based segmentation models show promising performance, they primarily focus on capturing intra-sample attentive relationships, disregarding inter-sample correlations which can potentially benefit DGSS. To this end, we enhance the attention modules in Transformer networks for improving DGSS by incorporating information from other independent samples in the same batch, enriching contextual information, and diversifying the training data for each attention block. Specifically, we propose two alternative intra-batch attention mechanisms, namely mean-based intra-batch attention (MIBA) and element-wise intra-batch attention (EIBA), to capture correlations between different samples, enhancing feature representation and generalization capabilities. Building upon intra-batch attention, we introduce IBAFormer, which integrates self-attention modules with the proposed intra-batch attention for DGSS. Extensive experiments demonstrate that IBAFormer achieves SOTA performance in DGSS, and ablation studies further confirm the effectiveness of each introduced component.
翻译:域泛化语义分割是一项关键且具有挑战性的任务,要求模型仅依赖源数据进行训练,且无法访问任何目标数据。尽管已有多种域泛化语义分割策略被提出,但卷积神经网络架构的泛化能力仍然有限。部分基于变换器的分割模型虽展现出优异性能,却主要聚焦于捕获样本内部注意力关系,忽视了可能有益于域泛化语义分割的样本间相关性。为此,我们通过引入同批次中其他独立样本的信息来增强变换器网络的注意力模块,以改进域泛化语义分割,具体包括丰富上下文信息并增加每个注意力块的训练数据多样性。我们提出两种替代性批内注意力机制——基于均值的批内注意力(MIBA)和逐元素的批内注意力(EIBA),用以捕获不同样本间的相关性,增强特征表示与泛化能力。基于批内注意力,我们构建了IBAFormer模型,该模型将自注意力模块与所提出的批内注意力相结合以实现域泛化语义分割。大量实验表明,IBAFormer在域泛化语义分割中达到了最先进性能,消融研究进一步证实了各引入模块的有效性。