We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain dataset into multiple subsets based on the camera labels, and initially train our model with a single subset (i.e., images captured by a single camera). We then gradually exploit more subsets for training, according to a curriculum sequence obtained with a camera-driven scheduling rule. The scheduler considers maximum mean discrepancies (MMD) between each subset and the source domain dataset, such that the subset closer to the source domain is exploited earlier within the curriculum. For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way. We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs. To address the camera bias problem, we also introduce a camera-diversity (CD) loss encouraging person images of the same pseudo label, but captured across various cameras, to involve more for discriminative feature learning, providing person representations robust to inter-camera variations. Experimental results on standard benchmarks, including real-to-real and synthetic-to-real scenarios, demonstrate the effectiveness of our framework.
翻译:我们提出了一种新颖的无监督域适应方法用于行人重识别(reID),该方法将从标注源域训练的模型泛化到未标注目标域。我们引入了一种基于摄像头驱动的课程学习(CaCL)框架,利用行人图像的摄像头标签逐步将知识从源域迁移到目标域。为此,我们根据摄像头标签将目标域数据集划分为多个子集,并首先使用单一子集(即单一摄像头拍摄的图像)训练模型,随后按照基于摄像头驱动的调度规则获得的课程顺序,逐步利用更多子集进行训练。该调度器考虑每个子集与源域数据集之间的最大均值差异(MMD),使得与源域更接近的子集在课程中更早被使用。对于每个课程序列,我们在目标域中生成行人图像的伪标签,以有监督方式训练reID模型。我们观察到伪标签严重偏向于摄像头,即同一摄像头获取的行人图像即使属于不同身份也倾向于获得相同伪标签。为解决摄像头偏差问题,我们还引入了一种摄像头多样性(CD)损失,鼓励具有相同伪标签但由不同摄像头捕获的行人图像更多参与判别性特征学习,从而提供对跨摄像头变化鲁棒的行人表示。在标准基准上的实验(包括真实到真实和合成到真实场景)证明了我们框架的有效性。