Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need to train a time-dependent classifier or a condition-dependent score estimator, which increases the cost of constructing conditional diffusion models and is inconvenient to transfer across different conditions. Some current works aim to overcome this limitation by proposing training-free solutions, but most can only be applied to a specific category of tasks and not to more general conditions. In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. Specifically, we leverage off-the-shelf pre-trained networks, such as a face detection model, to construct time-independent energy functions, which guide the generation process without requiring training. Furthermore, because the construction of the energy function is very flexible and adaptable to various conditions, our proposed FreeDoM has a broader range of applications than existing training-free methods. FreeDoM is advantageous in its simplicity, effectiveness, and low cost. Experiments demonstrate that FreeDoM is effective for various conditions and suitable for diffusion models of diverse data domains, including image and latent code domains.
翻译:近期,条件扩散模型因其卓越的生成能力在众多应用领域受到广泛关注。然而,现有方法大多需要训练,例如训练随时间变化的分类器或依赖条件的分数估计器,这增加了构建条件扩散模型的成本,且不利于在不同条件间迁移。部分当前工作旨在通过提出免训练方案来克服这一局限,但多数仅适用于特定任务类别,无法推广至更通用的条件。本文提出一种面向多种条件的免训练条件扩散模型(FreeDoM)。具体而言,我们利用现成预训练网络(如人脸检测模型)构建与时间无关的能量函数,无需训练即可引导生成过程。此外,由于能量函数的构建方式高度灵活且适用于多种条件,所提出的FreeDoM比现有免训练方法具有更广泛的应用范围。FreeDoM的优势在于其简洁性、高效性与低成本。实验表明,FreeDoM对多种条件均有效,并适用于包括图像域和隐编码域在内的多样化数据域扩散模型。