BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Diffusion models have shown unprecedented success in the task of text-to-image generation. While these models are capable of generating high-quality and realistic images, the complexity of sequential denoising has raised societal concerns regarding high computational demands and energy consumption. In response, various efforts have been made to improve inference efficiency. However, most of the existing efforts have taken a fixed approach with neural network simplification or text prompt optimization. Are the quality improvements from all denoising computations equally perceivable to humans? We observed that images from different text prompts may require different computational efforts given the desired content. The observation motivates us to present BudgetFusion, a novel model that suggests the most perceptually efficient number of diffusion steps before a diffusion model starts to generate an image. This is achieved by predicting multi-level perceptual metrics relative to diffusion steps. With the popular Stable Diffusion as an example, we conduct both numerical analyses and user studies. Our experiments show that BudgetFusion saves up to five seconds per prompt without compromising perceptual similarity. We hope this work can initiate efforts toward answering a core question: how much do humans perceptually gain from images created by a generative model, per watt of energy?

翻译：扩散模型在文本到图像生成任务中展现出前所未有的成功。尽管这些模型能够生成高质量且逼真的图像，但其序列去噪过程的复杂性引发了社会对其高计算需求和能耗的担忧。为此，学界已做出多种努力以提升推理效率。然而，现有工作大多采用固定方法，如简化神经网络或优化文本提示。所有去噪计算带来的质量提升对人类而言是否具有同等的可感知性？我们观察到，给定期望内容时，不同文本提示生成的图像可能需要不同的计算量。这一观察促使我们提出BudgetFusion——一种新颖的模型，它能在扩散模型开始生成图像前，建议最具感知效率的扩散步数。这是通过预测与扩散步数相关的多层级感知度量来实现的。以流行的Stable Diffusion为例，我们进行了数值分析和用户研究。实验表明，BudgetFusion在保持感知相似度的同时，可为每个提示节省多达五秒的生成时间。我们希望这项工作能推动学界开始探讨一个核心问题：生成模型每消耗一瓦特能量所产生的图像，人类在感知上能获得多少收益？

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日