In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progress) impractical. Therefore, we seek to improve the evaluation efficiency by selecting the representative subset of the text-image dataset. We systematically investigate the design choices, including the selection criteria (textural features or image-based metrics) and the selection granularity (prompt-level or set-level). We find that the insights from prior work on subset selection for training data do not generalize to this problem, and we propose FlashEval, an iterative search algorithm tailored to evaluation data selection. We demonstrate the effectiveness of FlashEval on ranking diffusion models with various configurations, including architectures, quantization levels, and sampler schedules on COCO and DiffusionDB datasets. Our searched 50-item subset could achieve comparable evaluation quality to the randomly sampled 500-item subset for COCO annotations on unseen models, achieving a 10x evaluation speedup. We release the condensed subset of these commonly used datasets to help facilitate diffusion algorithm design and evaluation, and open-source FlashEval as a tool for condensing future datasets, accessible at https://github.com/thu-nics/FlashEval.
翻译:近年来,文本到图像生成模型的开发取得了显著进展。评估生成模型的质量是开发过程中的关键环节之一。然而,评估过程可能消耗大量计算资源,使得周期性评估模型性能(例如监控训练进度)变得不切实际。为此,我们致力于通过选择文本-图像数据集的代表性子集来提高评估效率。我们系统性地研究了设计选择,包括选择标准(文本特征或基于图像的指标)和选择粒度(提示级或集合级)。我们发现,先前关于训练数据子集选择的研究成果无法直接适用于此问题,并提出了FlashEval——一种专为评估数据选择设计的迭代搜索算法。我们通过对多种配置(包括架构、量化级别和采样器调度)下的扩散模型进行排名,在COCO和DiffusionDB数据集上验证了FlashEval的有效性。对于COCO标注的未见模型,我们搜索得到的50项子集可达到与随机抽取500项子集相当的评估质量,实现10倍评估速度提升。我们发布了这些常用数据集的精简子集,以促进扩散算法的设计与评估,并将FlashEval作为精简未来数据集的开源工具,可通过https://github.com/thu-nics/FlashEval获取。