Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this challenge, we draw on Generative Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has thrived in computer vision problems because it produces high-quality images efficiently. We merely alter the popular Fast R-CNN model, which is capable of processing videos and yielding accurate detection outcomes. In order to appropriately relieve the pressure brought by the two-stage model, we design an Assisted-Identity Query Module (AIDQ) to provide positive images for the behind part. Besides, the proposed novel GAN-based Scene Synthesis model that can synthesize high-quality cross-id person images for person search tasks. In order to facilitate the feature learning of the GAN-based Scene Synthesis model, we adopt an online learning strategy that collaboratively learns the synthesized images and original images. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method has achieved great performance, and the extensive ablation study further justifies our GAN-synthetic data can effectively increase the variability of the datasets and be more realistic.
翻译:行人搜索是计算机视觉领域最近的一项具有挑战性的任务,旨在从真实摄像头中搜索特定行人。然而,大多数监控视频仅包含每个行人的少量图像,这些图像通常具有相同的背景和服装。因此,在真实场景中学习更具判别性的行人搜索特征较为困难。为应对这一挑战,我们利用生成对抗网络(GAN)从监控视频中合成数据。GAN在计算机视觉问题中蓬勃发展,因其能高效生成高质量图像。我们仅对流行的Fast R-CNN模型进行修改,该模型能够处理视频并产生准确的检测结果。为适当缓解两阶段模型带来的压力,我们设计了一个辅助身份查询模块(AIDQ),为后续部分提供正样本图像。此外,所提出的新型基于GAN的场景合成模型能够为行人搜索任务合成高质量的跨身份行人图像。为促进基于GAN的场景合成模型的特征学习,我们采用在线学习策略,协同学习合成图像与原始图像。在两个广泛使用的行人搜索基准数据集CUHK-SYSU和PRW上的大量实验表明,我们的方法取得了优异性能,而广泛的消融研究进一步证明了我们基于GAN的合成数据能有效增加数据集的多样性且更加真实。