Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.

翻译：可见光-红外行人重识别旨在通过分布式RGB与红外传感器网络检索同一人物的图像。部分跨模态行人重识别方法直接将可见光与红外模态信息集成到共享表示空间以区分行人。然而，由于可见光与红外模态间存在显著的数据分布差异，跨模态行人重识别仍具挑战性。近期方法通过利用可桥接可见光与红外模态的中间空间提升泛化能力，但仍需有效方法选择或生成此类信息性域的数据。本文提出自适应生成特权中间信息训练方法，用于自适应生成桥接可见光与红外模态判别信息的虚拟域。AGPI^2的核心动机在于通过生成提供额外信息的特权图像来增强深度跨模态行人重识别骨干网络的训练。这些特权图像捕获了原始可见光或红外模态中难以直接获取的共享判别特征。为实现该目标，采用对抗目标训练非线性生成模块，将可见光图像映射至与红外域域偏移更小的中间空间；同时，AGPI^2中的嵌入模块旨在为可见光图像与生成图像提取相似特征，从而鼓励提取跨模态共有特征。此外，AGPI^2采用对抗目标对中间图像进行自适应调整，这对于构建非模态特定空间以应对可见光与红外域间的大域偏移至关重要。在具有挑战性的跨模态行人重识别数据集上的实验结果表明，AGPI^2在不增加推理计算资源的情况下提升了匹配准确率。