Person Re-Identification (ReID) aims to retrieve relevant individuals in non-overlapping camera images and has a wide range of applications in the field of public safety. In recent years, with the development of Vision Transformer (ViT) and self-supervised learning techniques, the performance of person ReID based on self-supervised pre-training has been greatly improved. Person ReID requires extracting highly discriminative local fine-grained features of the human body, while traditional ViT is good at extracting context-related global features, making it difficult to focus on local human body features. To this end, this article introduces the recently emerged Masked Image Modeling (MIM) self-supervised learning method into person ReID, and effectively extracts high-quality global and local features through large-scale unsupervised pre-training by combining masked image modeling and discriminative contrastive learning, and then conducts supervised fine-tuning training in the person ReID task. This person feature extraction method based on ViT with masked image modeling (PersonViT) has the good characteristics of unsupervised, scalable, and strong generalization capabilities, overcoming the problem of difficult annotation in supervised person ReID, and achieves state-of-the-art results on publicly available benchmark datasets, including MSMT17, Market1501, DukeMTMC-reID, and Occluded-Duke. The code and pre-trained models of the PersonViT method are released at \url{https://github.com/hustvl/PersonViT} to promote further research in the person ReID field.
翻译:行人重识别(ReID)旨在非重叠摄像头图像中检索相关个体,在公共安全领域具有广泛应用。近年来,随着视觉Transformer(ViT)和自监督学习技术的发展,基于自监督预训练的行人ReID性能得到显著提升。行人ReID需要提取具有高度判别性的局部细粒度人体特征,而传统ViT擅长提取上下文相关的全局特征,难以聚焦于局部人体特征。为此,本文引入近期兴起的掩码图像建模(MIM)自监督学习方法至行人ReID领域,通过结合掩码图像建模与判别性对比学习进行大规模无监督预训练,有效提取高质量的全局与局部特征,随后在行人ReID任务中进行有监督微调训练。这种基于掩码图像建模的ViT行人特征提取方法(PersonViT)具备无监督、可扩展性强和泛化能力优异的特点,克服了有监督行人ReID中标注困难的难题,并在公开基准数据集(包括MSMT17、Market1501、DukeMTMC-reID和Occluded-Duke)上取得了最先进的结果。PersonViT方法的代码与预训练模型已发布于\url{https://github.com/hustvl/PersonViT},以促进行人ReID领域的进一步研究。