Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is then proposed to infer the aggregation. Furthermore, the cross-modal version of k-reciprocal re-ranking algorithm is investigated, named CMKR, in which three neighbor search strategies and one local query expansion method are explored to alleviate the modality bias problem of the near neighbors. We conduct experiments on SYSU-MM01, RegDB and LLCM datasets with our manually annotated descriptions. Both YYDS and CMKR achieve remarkable improvements over SOTA methods on all three datasets. Codes are available at https://github.com/dyhBUPT/YYDS.
翻译:可见光-红外行人重识别(VI-ReID)因显著的跨模态差异而极具挑战性。现有研究主要关注学习模态不变特征,同时抑制模态特定特征。然而,仅依赖红外样本检索可见光图像是一个极端问题,因为缺失了颜色信息。为此,我们提出Refer-VI-ReID设置,旨在通过红外图像与粗粒度语言描述(如“穿着红色上衣和黑色裤子的男性”)共同匹配目标可见光图像,以补充缺失的颜色信息。针对该任务,我们设计了一种Y-Y形分解结构,命名为YYDS,用于分解并聚合目标的纹理与颜色特征。具体而言,首先提出文本-交并比正则化策略以促进分解训练,随后设计联合关系模块来推断聚合过程。此外,我们研究了跨模态版本的k-互反重排序算法CMKR,其中探索了三种近邻搜索策略与一种局部查询扩展方法,以缓解近邻的模态偏差问题。我们在手动标注描述的SYSU-MM01、RegDB和LLCM数据集上进行了实验。YYDS与CMKR在所有三个数据集上均取得了相较于现有最优方法的显著提升。代码开源地址:https://github.com/dyhBUPT/YYDS。