Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders. Most existing methods focus on visible human body parts through some prior information. However, when complementary occlusions occur, features in occluded regions can interfere with matching, which affects performance severely. In this paper, different from most previous works that discard the occluded region, we propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space. Specifically, Occlusion Instance Augmentation (OIA) is proposed to simulates real and diverse occlusion situations on the holistic image. These augmented images not only enrich the amount of occlusion samples in the training set, but also form pairs with the holistic images. Subsequently, a dual-stream architecture with a shared encoder is proposed to learn paired discriminative features from pairs of inputs. Without additional semantic information, an occluded-holistic feature sample-label pair can be automatically created. Then, Feature Completion Decoder (FCD) is designed to complement the features of occluded regions by using learnable tokens to aggregate possible information from self-generated occluded features. Finally, we propose the Cross Hard Triplet (CHT) loss to further bridge the gap between complementing features and extracting features under the same ID. In addition, Feature Completion Consistency (FC$^2$) loss is introduced to help the generated completion feature distribution to be closer to the real holistic feature distribution. Extensive experiments over five challenging datasets demonstrate that the proposed FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
翻译:遮挡行人重识别(Re-ID)因遮挡物的破坏性而成为一个具有挑战性的问题。现有方法大多通过某些先验信息关注可见的人体部位。然而,当出现互补性遮挡时,被遮挡区域的特征会干扰匹配,严重影响性能。本文不同于以往丢弃遮挡区域的大多数工作,提出一种特征补全Transformer(FCFormer),在特征空间中隐式地补全被遮挡部分的语义信息。具体而言,我们提出遮挡实例增强(OIA)方法,在完整图像上模拟真实且多样化的遮挡情况。这些增强图像不仅丰富了训练集中遮挡样本的数量,还与完整图像形成配对。随后,提出一种采用共享编码器的双流架构,从配对输入中学习成对判别性特征。无需额外语义信息,即可自动生成遮挡-完整特征样本-标签配对。接着,设计特征补全解码器(FCD),通过可学习令牌从自生成的遮挡特征中聚合可能信息,以补全被遮挡区域的特征。最后,提出跨难例三元组(CHT)损失,进一步弥合同一身份下补全特征与提取特征之间的差距。此外,引入特征补全一致性(FC²)损失,使生成的补全特征分布更接近真实完整特征分布。在五个具有挑战性的数据集上进行的大量实验表明,所提出的FCFormer在遮挡数据集上性能优越,显著超越现有最先进方法。