A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification

Cloth-changing person reidentification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed. Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations. Current works mainly focus on body shape or contour sketches, but the human semantic information and the potential consistency of pedestrian features before and after changing clothes are not fully explored or are ignored. To solve these issues, in this work, a novel semantic-aware attention and visual shielding network for cloth-changing person ReID (abbreviated as SAVS) is proposed where the key idea is to shield clues related to the appearance of clothes and only focus on visual semantic information that is not sensitive to view/posture changes. Specifically, a visual semantic encoder is first employed to locate the human body and clothing regions based on human semantic segmentation information. Then, a human semantic attention module (HSA) is proposed to highlight the human semantic information and reweight the visual feature map. In addition, a visual clothes shielding module (VCS) is also designed to extract a more robust feature representation for the cloth-changing task by covering the clothing regions and focusing the model on the visual semantic information unrelated to the clothes. Most importantly, these two modules are jointly explored in an end-to-end unified framework. Extensive experiments demonstrate that the proposed method can significantly outperform state-of-the-art methods, and more robust features can be extracted for cloth-changing persons. Compared with FSAM (published in CVPR 2021), this method can achieve improvements of 32.7% (16.5%) and 14.9% (-) on the LTCC and PRCC datasets in terms of mAP (rank-1), respectively.

翻译：换衣行人重识别（Cloth-changing ReID）是一个新兴的研究课题，旨在检索更换衣物的行人。由于不同衣物下的人体外观存在较大差异，现有方法难以提取具有判别性和鲁棒性的特征表示。当前研究主要关注身体形状或轮廓草图，但人体语义信息及换衣前后行人特征的潜在一致性尚未被充分挖掘或忽视。为解决上述问题，本文提出一种新颖的面向换衣行人重识别的语义感知注意力与视觉遮挡网络（简称SAVS），其核心思想是屏蔽与衣物外观相关的线索，仅聚焦于对视角/姿态变化不敏感的视觉语义信息。具体而言，首先基于人体语义分割信息，利用视觉语义编码器定位人体区域和衣物区域；随后提出人体语义注意力模块（HSA），用于增强人体语义信息并重新加权视觉特征图；此外，还设计了视觉衣物遮挡模块（VCS），通过覆盖衣物区域，使模型聚焦于与衣物无关的视觉语义信息，从而提取更鲁棒的换衣任务特征表示。最重要的是，这两个模块被整合在端到端的统一框架中协同探索。大量实验表明，所提方法显著优于最先进方法，能够为换衣行人提取更鲁棒的特征。与FSAM（发表于CVPR 2021）相比，本方法在LTCC和PRCC数据集上的mAP（Rank-1）分别提升32.7%（16.5%）和14.9%（-）。