Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, detection and Re-IDentification (ReID). Despite significant progress, two major challenges remain: 1) Detection-prior modules in previous methods are suboptimal for the ReID task. 2) The collaboration between two sub-tasks is ignored. To alleviate these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Unlike existing methods that follow the Detection-to-ReID paradigm, our denoising paradigm eliminates detection-prior modules to avoid the local-optimum of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.
翻译:当前主流的人物搜索方法旨在统一网络中定位并识别查询人物,通过联合优化检测与行人重识别(ReID)两个子任务实现。尽管取得了显著进展,仍存在两大挑战:1)现有方法中的检测优先模块对ReID任务并非最优;2)两个子任务间的协作性被忽视。为解决这些问题,我们提出基于扩散模型的新型人物搜索框架PSDiff。该框架将人物搜索构建为从含噪边界框和ReID嵌入到真值的双重去噪过程,不同于现有遵循检测-重识别范式的方案,去噪范式通过消除检测优先模块避免了ReID任务的局部最优解。基于新范式,我们进一步设计协作去噪层(CDL),通过迭代协作方式优化检测与ReID子任务,使二者相互促进。在标准基准上的大量实验表明,PSDiff以更少的参数量和弹性计算开销实现了最先进性能。