Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM). As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code will be made publicly available.

翻译：行人重识别（re-ID）在遮挡场景下仍是一项重大挑战。先前解决遮挡问题的方法主要侧重于通过利用外部语义线索来对齐人体物理特征。然而，这些方法往往结构复杂且易受噪声干扰。针对上述挑战，我们提出了一种创新的端到端解决方案——动态补丁感知增强Transformer（DPEFormer）。该模型能够自动动态地区分人体信息与遮挡物，无需外部检测器或精确图像对齐。具体而言，我们引入了动态补丁令牌选择模块（DPSM）。DPSM利用标签引导的代理令牌作为媒介，识别信息丰富且无遮挡的令牌，并基于这些令牌推导后续局部部件特征。为了实现全局分类特征与DPSM选取的精细局部特征的无缝融合，我们设计了一种新颖的特征混合模块（FBM）。FBM利用信息的互补特性及部件多样性增强特征表示。此外，为确保DPSM及整个DPEFormer仅凭身份标签即可有效学习，我们提出了一种真实遮挡增强（ROA）策略。该策略借助近期提出的分割一切模型（SAM），生成与真实遮挡高度相似的遮挡图像，显著增强后续对比学习过程。在遮挡和全身重识别基准上的实验表明，DPEFormer相较于现有最先进方法取得了显著进展。相关代码将公开发布。