Conditional diffusion image generators can be repurposed for editing through inversion, without the need for large-scale paired fine-tuning data. However, producing high-quality, targeted edits while maintaining image identity and global consistency remains challenging, as weakly conditioned inversion often embeds conflicting image features into the noise. We demonstrate that incorporating a residual image encoding as additional conditioning enables both improved identity preservation and better editability. We optimize this residual encoding to provide a strong conditioning signal for reconstruction, thereby reducing the reliance on inversion and susceptibility to its aforementioned pitfalls. To ensure this residual does not interfere with desired edits, we incorporate a gradient reversal-based optimization strategy that disentangles the residual from the edited condition. We illustrate our method's ability to produce high-fidelity results across precise intrinsic-based editing and relighting, and show proof-of-concept text-guided manipulation.
翻译:条件扩散图像生成器可通过反演技术被重新用于图像编辑,无需大规模配对微调数据。然而,在保持图像身份特征与全局一致性的同时实现高质量、有针对性的编辑仍具挑战性,因为弱条件反演常将冲突的图像特征嵌入噪声中。我们证明,将残差图像编码作为附加条件能够同时提升身份保留能力和可编辑性。我们优化该残差编码以提供强重建条件信号,从而减少对反演的依赖及其前述缺陷的影响。为确保该残差不干扰期望编辑,我们引入基于梯度反转的优化策略,使残差与编辑条件解耦。实验表明,该方法在基于内在属性的精确编辑和重光照中均能产生高保真结果,并展示了概念验证性的文本引导操控能力。