Text-conditional image editing is a very useful task that has recently emerged with immeasurable potential. Most current real image editing methods first need to complete the reconstruction of the image, and then editing is carried out by various methods based on the reconstruction. Most methods use DDIM Inversion for reconstruction, however, DDIM Inversion often fails to guarantee reconstruction performance, i.e., it fails to produce results that preserve the original image content. To address the problem of reconstruction failure, we propose FEC, which consists of three sampling methods, each designed for different editing types and settings. Our three methods of FEC achieve two important goals in image editing task: 1) ensuring successful reconstruction, i.e., sampling to get a generated result that preserves the texture and features of the original real image. 2) these sampling methods can be paired with many editing methods and greatly improve the performance of these editing methods to accomplish various editing tasks. In addition, none of our sampling methods require fine-tuning of the diffusion model or time-consuming training on large-scale datasets. Hence the cost of time as well as the use of computer memory and computation can be significantly reduced.
翻译:文本条件图像编辑是一项具有巨大潜力且近期涌现的实用任务。当前多数真实图像编辑方法需先完成图像重建,再基于重建结果通过各种方法实施编辑。多数方法采用DDIM反演进行重建,但DDIM反演常无法保证重建性能,即无法生成保留原始图像内容的结果。针对重建失败问题,我们提出FEC方法,包含三种采样策略,每种策略针对不同的编辑类型与设置而设计。我们的三种FEC方法实现了图像编辑任务的两个重要目标:1)确保成功重建,即采样获得保留原始真实图像纹理与特征的生成结果;2)这些采样方法可与多种编辑方法配合使用,显著提升这些编辑方法完成各类编辑任务的性能。此外,我们的所有采样方法均无需对扩散模型进行微调,亦无需在大规模数据集上进行耗时训练,从而能大幅降低时间成本及计算机内存与算力消耗。