DLCR：一种基于扩散的换装行人再识别生成式数据扩展框架 (DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID)

With the recent exhibited strength of generative diffusion models, an open research question is \textit{if images generated by these models can be used to learn better visual representations}. While this generative data expansion may suffice for easier visual tasks, we explore its efficacy on a more difficult discriminative task: clothes-changing person re-identification (CC-ReID). CC-ReID aims to match people appearing in non-overlapping cameras, even when they change their clothes across cameras. Not only are current CC-ReID models constrained by the limited diversity of clothing in current CC-ReID datasets, but generating additional data that retains important personal features for accurate identification is a current challenge. To address this issue we propose DLCR, a novel data expansion framework that leverages pre-trained diffusion and large language models (LLMs) to accurately generate diverse images of individuals in varied attire. We generate additional data for five benchmark CC-ReID datasets (PRCC, CCVID, LaST, VC-Clothes, and LTCC) and \textbf{increase their clothing diversity by \boldmath{$10$}x, totaling over \boldmath{$2.1$}M images generated}. DLCR employs diffusion-based text-guided inpainting, conditioned on clothing prompts constructed using LLMs, to generate synthetic data that only modifies a subject's clothes while preserving their personally identifiable features. With this massive increase in data, we introduce two novel strategies - progressive learning and test-time prediction refinement - that respectively reduce training time and further boosts CC-ReID performance. On the PRCC dataset, we obtain a large top-1 accuracy improvement of $11.3\%$ by training CAL, a previous state of the art (SOTA) method, with DLCR-generated data. We publicly release our code and generated data for each dataset here: \url{https://github.com/CroitoruAlin/dlcr}.

翻译：随着生成式扩散模型近期展现的强大能力，一个开放的研究问题是：\textit{这些模型生成的图像能否用于学习更好的视觉表示？}虽然这种生成式数据扩展可能足以应对较简单的视觉任务，但我们探索了其在更具挑战性的判别性任务上的有效性：换装行人再识别（CC-ReID）。CC-ReID旨在匹配出现在非重叠摄像头中的人员，即使他们在不同摄像头间更换了服装。当前的CC-ReID模型不仅受限于现有CC-ReID数据集中服装多样性的不足，而且生成能够保留重要个人特征以进行准确识别的额外数据也是当前的挑战。为解决这一问题，我们提出了DLCR，一种新颖的数据扩展框架，利用预训练的扩散模型和大语言模型（LLMs）来准确生成穿着多样化服装的个体图像。我们为五个基准CC-ReID数据集（PRCC、CCVID、LaST、VC-Clothes和LTCC）生成了额外数据，\textbf{将其服装多样性提高了\boldmath{$10$}倍，总计生成超过\boldmath{$2.1$}M张图像}。DLCR采用基于扩散的文本引导修复技术，通过LLMs构建的服装提示进行条件控制，生成仅修改主体服装而保留其个人可识别特征的合成数据。借助数据量的大幅增加，我们引入了两种新颖策略——渐进式学习和测试时预测优化——分别减少了训练时间并进一步提升了CC-ReID性能。在PRCC数据集上，通过使用DLCR生成的数据训练先前的最先进方法CAL，我们获得了$11.3\%$的大幅top-1准确率提升。我们在此公开发布每个数据集的代码和生成数据：\url{https://github.com/CroitoruAlin/dlcr}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日