To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information. However, due to missing part appearance information caused by occlusion and noisy spatial information from external model, these purely vision-based approaches fail to correctly learn the features of human body parts from limited training data and struggle in accurately locating body parts, ultimately leading to misaligned part features. To tackle these challenges, we propose a Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features. ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding, enabling the textual prompts to have spatial awareness. Furthermore, to alleviate the noise from external masks, ProFD adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact. Additionally, to avoid catastrophic forgetting, we employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting. Evaluation results on the Market1501, DukeMTMC-ReID, Occluded-Duke, Occluded-ReID, and P-DukeMTMC datasets demonstrate that ProFD achieves state-of-the-art results. Our project is available at: https://github.com/Cuixxx/ProFD.
翻译:为解决行人重识别任务中的遮挡问题,已有许多方法通过引入外部空间信息来提取部件特征。然而,由于遮挡导致的部件外观信息缺失以及外部模型提供的空间信息存在噪声,这些纯视觉方法难以从有限的训练数据中正确学习人体部件的特征,且无法准确定位身体部位,最终导致部件特征错位。为应对这些挑战,我们提出了一种基于提示引导的特征解耦方法(ProFD),该方法利用文本模态中丰富的预训练知识,辅助模型生成对齐良好的部件特征。ProFD首先设计了部件特定的提示词,并利用带噪声的分割掩码初步对齐视觉与文本嵌入,使文本提示具备空间感知能力。此外,为减轻外部掩码的噪声影响,ProFD采用混合注意力解码器,在解码过程中确保空间与语义的一致性以最小化噪声干扰。同时,为避免灾难性遗忘,我们采用自蒸馏策略,保留CLIP的预训练知识以缓解过拟合。在Market1501、DukeMTMC-ReID、Occluded-Duke、Occluded-ReID及P-DukeMTMC数据集上的评估结果表明,ProFD取得了最先进的性能。本项目开源地址为:https://github.com/Cuixxx/ProFD。