Test-time domain adaption (TTDA) for semantic segmentation aims to adapt a segmentation model trained on a source domain to a target domain for inference on-the-fly, where both efficiency and effectiveness are critical. However, existing TTDA methods either rely on costly frame-wise optimization or assume unrealistic domain shifts, resulting in poor adaptation efficiency and continuous semantic ambiguities. To address these challenges, we propose a real-time framework for TTDA semantic segmentation, called Dynamic Ambiguity-Wise Adaptation (DAWA), which adaptively detects domain shifts and dynamically adjusts the learning strategies to mitigate continuous ambiguities in the test time. Specifically, we introduce the Dynamic Ambiguous Patch Mask (DAP Mask) strategy, which dynamically identifies and masks highly disturbed regions to prevent error accumulation in ambiguous classes. Furthermore, we present the Dynamic Ambiguous Class Mix (DAC Mix) strategy that leverages vision-language models to group semantically similar classes and augment the target domain with a meta-ambiguous class buffer. Extensive experiments on widely used TTDA benchmarks demonstrate that DAWA consistently outperforms state-of-the-art methods, while maintaining real-time inference speeds of approximately 40 FPS.
翻译:语义分割的测试时域适应(TTDA)旨在将源域上训练的分割模型动态适应于目标域以进行实时推理,其中效率与效果均至关重要。然而,现有TTDA方法要么依赖代价高昂的逐帧优化,要么假设不现实的域偏移,导致适应效率低下且存在持续的语义模糊性。为应对这些挑战,我们提出了一种用于TTDA语义分割的实时框架,称为动态模糊感知适应(DAWA),该框架能自适应地检测域偏移并动态调整学习策略,以缓解测试过程中持续的模糊性问题。具体而言,我们引入了动态模糊块掩码(DAP Mask)策略,动态识别并掩蔽高度扰动的区域,以防止模糊类别中的误差累积。此外,我们提出了动态模糊类别混合(DAC Mix)策略,利用视觉-语言模型对语义相似的类别进行分组,并通过元模糊类别缓冲区增强目标域数据。在广泛使用的TTDA基准测试上的大量实验表明,DAWA在保持约40 FPS实时推理速度的同时,持续优于现有最先进方法。