Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

Unsupervised domain adaptation object detection (UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage, each facing problems in obtaining reliable pretrained model and achieving consistent performance gains. Methods mentioned above have not yet explore how to utilize the third related domain such as target-like domain to assist adaptation. To address these issues, we propose a two-stage framework named MTM, i.e. Mean Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher and propose a module called Object Queries Knowledge Transfer (OQKT) to ensure consistent performance gains of the student model. Most importantly, we propose masked feature alignment methods including Masked Domain Query-based Feature Alignment (MDQFA) and Masked Token-wise Feature Alignment (MTWFA) to alleviate domain shift in a more robust way, which not only prevent training stagnation and lead to a robust pretrained model in the pretraining stage, but also enhance the model's target performance in the self-training stage. Experiments on three challenging scenarios and a theoretical analysis verify the effectiveness of MTM.

翻译：无监督域自适应目标检测（UDAOD）在检测Transformer（DETR）上的研究主要聚焦于特征对齐，现有方法可分为两类，每类均存在未解决的问题。单阶段特征对齐方法易导致性能波动与训练停滞。基于均值教师的两阶段特征对齐方法包含预训练阶段和自训练阶段，分别面临获取可靠预训练模型和实现一致性性能提升的难题。上述方法尚未探索如何利用第三相关域（如目标相似域）辅助自适应。为解决这些问题，我们提出名为MTM的两阶段框架，即基于掩码特征对齐的均值教师DETR。在预训练阶段，我们利用图像风格迁移生成的带标签目标相似图像避免性能波动；在自训练阶段，我们通过基于均值教师的伪标签利用无标签目标图像，并提出查询知识迁移模块（OQKT）确保学生模型获得一致性性能提升。更重要的是，我们提出掩码特征对齐方法，包括基于掩码域查询的特征对齐（MDQFA）和掩码令牌级特征对齐（MTWFA），以更鲁棒的方式缓解域偏移，这不仅在预训练阶段防止训练停滞并产生鲁棒预训练模型，还在自训练阶段增强模型在目标域的性能。三个具有挑战性场景的实验及理论分析验证了MTM的有效性。