Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.
翻译:基于点的交互式编辑是增强现有生成模型可控性的重要工具。同期工作DragDiffusion根据用户输入更新扩散潜变量图,导致全局潜变量图发生改变。这会产生原始内容保留不精确以及因梯度消失导致的编辑失败问题。相比之下,我们提出的DragNoise方法能够在无需回溯潜变量图的情况下实现鲁棒且加速的编辑。DragNoise的核心原理在于利用每个U-Net预测的噪声输出作为语义编辑器。该方法基于两个关键观察:首先,U-Net的瓶颈特征天然具备适合交互式编辑的丰富语义特征;其次,在去噪过程早期建立的高级语义在后续阶段变化极小。基于这些发现,DragNoise在单个去噪步骤中编辑扩散语义,并有效传播这些变化,确保扩散编辑的稳定性和效率。对比实验表明,与DragDiffusion相比,DragNoise在实现更优控制效果和语义保留的同时,优化时间减少超过50%。我们的代码开源在https://github.com/haofengl/DragNoise。