Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between two sub-tasks is overlooked. To address these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Distinct from the conventional Detection-to-ReID approach, our denoising paradigm discards prior pedestrian candidates generated by detectors, thereby avoiding the local optimum problem of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.
翻译:主流行人搜索方法旨在统一网络中定位并识别查询行人,联合优化行人检测与重识别两个子任务。尽管取得显著进展,现有方法仍面临两大挑战:1)检测器学习到的行人候选框对重识别任务具有次优性;2)两个子任务间的协同潜力未被充分挖掘。为解决这些问题,我们提出基于扩散模型的新型行人搜索框架PSDiff。该框架将行人搜索建模为从噪声框与重识别嵌入到真实标注的双重去噪过程。区别于传统的“检测-重识别”范式,我们的去噪范式摒弃了检测器生成的先验行人候选框,从而避免了重识别任务的局部最优问题。基于新范式,我们进一步设计了协同去噪层,以迭代协作的方式优化检测与重识别子任务,实现两个子任务的互利共赢。在标准基准测试上的大量实验表明,PSDiff以更少的参数量和弹性计算开销实现了最先进的性能。