Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
翻译:驯服最先进的扩散与流匹配(Flow-Matching, FM)模型的生成结果,而无需重新训练任务特定模型,为求解逆问题、条件生成及一般可控生成解锁了一种强大工具。本文提出D-Flow,一个通过微分生成过程并优化源点(噪声点)来实现生成控制的简洁框架。该框架的关键洞察在于:对于采用高斯概率路径训练的扩散/FM模型,微分生成过程会将梯度投影到数据流形上,从而隐式地将先验信息注入优化过程。我们在线性与非线性可控生成问题中验证了该框架,包括图像与音频逆问题以及条件分子生成,在所有任务上均达到最先进性能。