IIITD-20K: Dense captioning for Text-Image ReID

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD-20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. With a minimum of 26 words for a description, each image is densely captioned. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings. Our dataset is available at https://bit.ly/3pkA3Rj.

翻译：文本到图像（Text-to-Image, T2I）重识别（ReID）近年来受到广泛关注。CUHK-PEDES、RSTPReid和ICFG-PEDES是目前评估T2I重识别方法的三个基准数据集。RSTPReid和ICFG-PEDES包含来自MSMT17的身份数据，但由于独特人物数量有限，其多样性受到限制。另一方面，CUHK-PEDES包含13,003个身份，但平均文本描述较短。此外，这些数据集在受限环境中采集，摄像头数量有限。为增加身份多样性并提供密集描述，我们提出了名为IIITD-20K的新数据集。IIITD-20K包含20,000个在自然场景中采集的独特身份，为文本到图像重识别提供了丰富数据集。每张图像均有至少26个词构成的密集描述。我们进一步利用基于该数据集训练的Stable-diffusion和BLIP模型合成图像和细粒度描述。基于最先进的文本到图像重识别模型和视觉-语言预训练模型，我们进行了详尽的实验，并对数据集进行了全面分析。实验结果表明，合成数据在相同数据集和跨数据集场景下均能显著提升性能。我们的数据集可通过https://bit.ly/3pkA3Rj获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日