FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need to train a time-dependent classifier or a condition-dependent score estimator, which increases the cost of constructing conditional diffusion models and is inconvenient to transfer across different conditions. Some current works aim to overcome this limitation by proposing training-free solutions, but most can only be applied to a specific category of tasks and not to more general conditions. In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. Specifically, we leverage off-the-shelf pre-trained networks, such as a face detection model, to construct time-independent energy functions, which guide the generation process without requiring training. Furthermore, because the construction of the energy function is very flexible and adaptable to various conditions, our proposed FreeDoM has a broader range of applications than existing training-free methods. FreeDoM is advantageous in its simplicity, effectiveness, and low cost. Experiments demonstrate that FreeDoM is effective for various conditions and suitable for diffusion models of diverse data domains, including image and latent code domains.

翻译：近期，条件扩散模型因其卓越的生成能力在众多应用领域受到广泛关注。然而，现有方法大多需要训练，例如训练随时间变化的分类器或依赖条件的分数估计器，这增加了构建条件扩散模型的成本，且不利于在不同条件间迁移。部分当前工作旨在通过提出免训练方案来克服这一局限，但多数仅适用于特定任务类别，无法推广至更通用的条件。本文提出一种面向多种条件的免训练条件扩散模型（FreeDoM）。具体而言，我们利用现成预训练网络（如人脸检测模型）构建与时间无关的能量函数，无需训练即可引导生成过程。此外，由于能量函数的构建方式高度灵活且适用于多种条件，所提出的FreeDoM比现有免训练方法具有更广泛的应用范围。FreeDoM的优势在于其简洁性、高效性与低成本。实验表明，FreeDoM对多种条件均有效，并适用于包括图像域和隐编码域在内的多样化数据域扩散模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

42+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

CVPR 2021｜无需干净图像的自监督图像降噪

专知会员服务

39+阅读 · 2021年3月29日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日