Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning

Unsupervised Domain Adaptation (UDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain but often suffers from severe domain and scale gaps that degrade performance. Existing cross-attention-based transformers can align features across domains, yet they struggle to preserve content semantics under large appearance and scale variations. To explicitly address these challenges, we introduce the concept of beneficial noise, which regularizes cross-attention by injecting controlled perturbations, encouraging the model to ignore style distractions and focus on content. We propose the Domain-Adaptive Cross-Scale Matching (DACSM) framework, which consists of a Domain-Adaptive Transformer (DAT) for disentangling domain-shared content from domain-specific style, and a Cross-Scale Matching (CSM) module that adaptively aligns features across multiple resolutions. DAT incorporates beneficial noise into cross-attention, enabling progressive domain translation with enhanced robustness, yielding content-consistent and style-invariant representations. Meanwhile, CSM ensures semantic consistency under scale changes. Extensive experiments on VisDA-2017, Office-Home, and DomainNet demonstrate that DACSM achieves state-of-the-art performance, with up to +2.3% improvement over CDTrans on VisDA-2017. Notably, DACSM achieves a +5.9% gain on the challenging "truck" class of VisDA, evidencing the strength of beneficial noise in handling scale discrepancies. These results highlight the effectiveness of combining domain translation, beneficial-noise-enhanced attention, and scale-aware alignment for robust cross-domain representation learning.

翻译：无监督领域自适应（UDA）旨在将知识从有标注的源领域迁移到无标注的目标领域，但常因严重的领域与尺度差异而导致性能下降。现有的基于交叉注意力的Transformer模型能够跨领域对齐特征，但在外观与尺度变化较大时难以保持内容语义。为明确应对这些挑战，我们引入了有益噪声的概念，通过注入受控扰动来正则化交叉注意力，促使模型忽略风格干扰并聚焦于内容。我们提出了领域自适应跨尺度匹配（DACSM）框架，该框架包含一个用于从领域特定风格中解耦出领域共享内容的领域自适应Transformer（DAT），以及一个自适应地在多分辨率下对齐特征的跨尺度匹配（CSM）模块。DAT将有益噪声融入交叉注意力，实现了具有增强鲁棒性的渐进式领域迁移，从而产生内容一致且风格不变的表示。同时，CSM确保了尺度变化下的语义一致性。在VisDA-2017、Office-Home和DomainNet上的大量实验表明，DACSM取得了最先进的性能，在VisDA-2017上相比CDTrans最高提升了+2.3%。值得注意的是，DACSM在VisDA具有挑战性的“卡车”类别上实现了+5.9%的性能增益，这证明了有益噪声在处理尺度差异方面的优势。这些结果凸显了结合领域迁移、有益噪声增强的注意力以及尺度感知对齐对于实现鲁棒的跨领域表示学习的有效性。