FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progress) impractical. Therefore, we seek to improve the evaluation efficiency by selecting the representative subset of the text-image dataset. We systematically investigate the design choices, including the selection criteria (textural features or image-based metrics) and the selection granularity (prompt-level or set-level). We find that the insights from prior work on subset selection for training data do not generalize to this problem, and we propose FlashEval, an iterative search algorithm tailored to evaluation data selection. We demonstrate the effectiveness of FlashEval on ranking diffusion models with various configurations, including architectures, quantization levels, and sampler schedules on COCO and DiffusionDB datasets. Our searched 50-item subset could achieve comparable evaluation quality to the randomly sampled 500-item subset for COCO annotations on unseen models, achieving a 10x evaluation speedup. We release the condensed subset of these commonly used datasets to help facilitate diffusion algorithm design and evaluation, and open-source FlashEval as a tool for condensing future datasets, accessible at https://github.com/thu-nics/FlashEval.

翻译：近年来，文本到图像生成模型的开发取得了显著进展。评估生成模型的质量是开发过程中的关键环节之一。然而，评估过程可能消耗大量计算资源，使得周期性评估模型性能（例如监控训练进度）变得不切实际。为此，我们致力于通过选择文本-图像数据集的代表性子集来提高评估效率。我们系统性地研究了设计选择，包括选择标准（文本特征或基于图像的指标）和选择粒度（提示级或集合级）。我们发现，先前关于训练数据子集选择的研究成果无法直接适用于此问题，并提出了FlashEval——一种专为评估数据选择设计的迭代搜索算法。我们通过对多种配置（包括架构、量化级别和采样器调度）下的扩散模型进行排名，在COCO和DiffusionDB数据集上验证了FlashEval的有效性。对于COCO标注的未见模型，我们搜索得到的50项子集可达到与随机抽取500项子集相当的评估质量，实现10倍评估速度提升。我们发布了这些常用数据集的精简子集，以促进扩散算法的设计与评估，并将FlashEval作为精简未来数据集的开源工具，可通过https://github.com/thu-nics/FlashEval获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日