Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between two sub-tasks is overlooked. To address these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Distinct from the conventional Detection-to-ReID approach, our denoising paradigm discards prior pedestrian candidates generated by detectors, thereby avoiding the local optimum problem of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.
翻译:主流行人搜索方法旨在通过统一网络同时优化行人检测与行人重识别两个子任务,以实现对查询行人的定位与识别。尽管已有显著进展,现有方法仍面临两大挑战:1)检测器学习到的行人候选框对重识别任务而言并非最优;2)两个子任务之间的协作潜力未被充分利用。针对这些问题,本文提出基于扩散模型的新型行人搜索框架PSDiff。PSDiff将行人搜索建模为从含噪边界框与重识别嵌入到真实分布的联合去噪过程。与传统的检测到重识别范式不同,本去噪范式摒弃了检测器生成的先验行人候选框,从而避免重识别任务陷入局部最优解。基于该新范式,我们进一步设计协作式去噪层(CDL),通过迭代协作方式联合优化检测与重识别子任务,实现两任务相互促进。在标准基准数据集上的大量实验表明,PSDiff在参数更少且计算弹性可控的条件下达到了最先进性能。