Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we exploit them in a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement. Specifically, re-ID embeddings and context features are simultaneously learned in a multi-stage fashion, ultimately leading to enhanced, discriminative features for person search. We conduct the experiments on two person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our approach to a more challenging setting (i.e., character search on MovieNet). Extensive experimental results demonstrate the consistent improvement of the proposed GLCNet over the state-of-the-art methods on all three datasets. Our source codes, pre-trained models, and the new dataset are publicly available at: https://github.com/ZhengPeng7/GLCNet.
翻译:行人搜索旨在从自然、未裁剪的图像中同时定位和识别目标行人,该方向在过去几年中得到了广泛研究。本文深入挖掘了目标行人周围全局与局部两个层面的丰富上下文信息,分别称之为场景上下文和群体上下文。与以往将两类上下文分开处理的研究不同,我们将其统一整合到全局-局部上下文网络(GLCNet)中,以直观实现特征增强为目标。具体而言,我们以多阶段方式同步学习行人重识别嵌入特征与上下文特征,最终生成增强且具有判别性的行人搜索特征。我们在两个行人搜索基准数据集(CUHK-SYSU和PRW)上开展实验,并将方法拓展至更具挑战性的场景(即MovieNet数据集上的角色搜索)。大量实验结果表明,在三个数据集上,我们提出的GLCNet相较于现有最优方法均取得了一致性提升。相关源代码、预训练模型及新数据集已开源至:https://github.com/ZhengPeng7/GLCNet。