Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition. This work delves into the impact of generative images, primarily comparing paradigms that harness external data (\ie generative \vs retrieval \vs original). Our key contributions are: \textbf{1) GenBench Construction:} We devise \textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks. \textbf{2) CLER Score:} To address the insufficient correlation of existing metrics (\eg, FID, CLIP score) with downstream recognition performance, we propose \textbf{CLER}, a training-free metric indicating generative data's efficiency for recognition tasks prior to training. \textbf{3) New Baselines:} Comparisons of generative data with retrieved data from the same external pool help to elucidate the unique traits of generative data. \textbf{4) External Knowledge Injection:} By fine-tuning special token embeddings for each category via Textual Inversion, performance improves across 17 datasets, except when dealing with low-resolution reference images. Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
翻译:大规模预训练生成模型的进步拓展了其在视觉识别中作为有效数据生成器的潜力。本文深入探讨生成图像的影响,主要比较利用外部数据的范式(即生成式 \vs 检索式 \vs 原始式)。我们的核心贡献包括:\textbf{1) GenBench构建:} 我们设计了\textbf{GenBench},这是一个涵盖22个数据集、2548个类别的广泛基准,用于评估各类视觉识别任务中的生成数据。\textbf{2) CLER评分:} 为解决现有指标(如FID、CLIP评分)与下游识别性能相关性不足的问题,我们提出了\textbf{CLER},一种无需训练的度量标准,可在训练前指示生成数据对识别任务的效率。\textbf{3) 新基线:} 将生成数据与来自同一外部池的检索数据进行比较,有助于阐明生成数据的独特特性。\textbf{4) 外部知识注入:} 通过文本反转为每个类别微调特殊标记嵌入,在17个数据集上提升了性能,但处理低分辨率参考图像时除外。我们详尽的基准测试与分析突显了生成数据在视觉识别中的潜力,同时指明了未来研究的关键挑战。