Domain generalization person re-identification (DG-ReID) aims to train a model on source domains and generalize well on unseen domains. Vision Transformer usually yields better generalization ability than common CNN networks under distribution shifts. However, Transformer-based ReID models inevitably over-fit to domain-specific biases due to the supervised learning strategy on the source domain. We observe that while the global images of different IDs should have different features, their similar local parts (e.g., black backpack) are not bounded by this constraint. Motivated by this, we propose a pure Transformer model (termed Part-aware Transformer) for DG-ReID by designing a proxy task, named Cross-ID Similarity Learning (CSL), to mine local visual information shared by different IDs. This proxy task allows the model to learn generic features because it only cares about the visual similarity of the parts regardless of the ID labels, thus alleviating the side effect of domain-specific biases. Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features. Our method achieves state-of-the-art performance under most DG ReID settings. Under the Market$\to$Duke setting, our method exceeds state-of-the-art by 10.9% and 12.8% in Rank1 and mAP, respectively. The code is available at https://github.com/liyuke65535/Part-Aware-Transformer.
翻译:领域泛化行人重识别(DG-ReID)旨在利用源域训练模型,并在未见过的目标域上取得良好的泛化性能。视觉Transformer通常比常规CNN网络在分布偏移下具有更好的泛化能力。然而,基于Transformer的ReID模型因在源域采用监督学习策略,不可避免地会过度拟合特定领域的偏差。我们观察到,尽管不同ID的全局图像应具有不同特征,但其相似的局部部件(例如黑色背包)不受此约束限制。受此启发,我们提出一种用于DG-ReID的纯Transformer模型(称为Part-aware Transformer),通过设计一项名为跨ID相似性学习(CSL)的代理任务,挖掘不同ID共享的局部视觉信息。该代理任务仅关注部件间的视觉相似性而忽略ID标签,从而使模型学习通用特征,从而减轻领域特定偏差的负面影响。基于CSL中获得的局部相似性,我们进一步提出部件引导的自蒸馏(PSD)方法,以提升全局特征的泛化能力。我们的方法在大多数DG-ReID设置下均达到了最先进的性能。在Market→Duke设置下,我们的方法在Rank1和mAP指标上分别以10.9%和12.8%的幅度超越现有最优方法。代码已开源至https://github.com/liyuke65535/Part-Aware-Transformer。