Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings (i.e. zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks. Code: https://github.com/CVMI-Lab/SyntheticData.
翻译:近期的文本到图像生成模型在生成高保真度照片级真实感图像方面展现了令人瞩目的成果。尽管这些结果对人类视觉而言令人惊叹,但这些生成图像在识别任务中的适用性仍鲜有探索。本研究系统探讨了当前最先进的文本到图像生成模型所合成的图像能否以及如何用于图像识别任务,重点关注两个维度:在数据稀缺场景(即零样本和少样本)下利用合成数据改进分类模型,以及利用合成数据进行大规模模型预训练以实现迁移学习。我们展示了现有生成模型合成数据的优势与不足,并提出了更有效应用于识别任务的策略。代码:https://github.com/CVMI-Lab/SyntheticData。