Prompt-based models have demonstrated impressive prompt-following capability at image editing tasks. However, the models still struggle with following detailed editing prompts or performing local edits. Specifically, global image quality often deteriorates immediately after a single editing step. To address these challenges, we introduce SPICE, a training-free workflow that accepts arbitrary resolutions and aspect ratios, accurately follows user requirements, and consistently improves image quality during more than 100 editing steps, while keeping the unedited regions intact. By synergizing the strengths of a base diffusion model and a Canny edge ControlNet model, SPICE robustly handles free-form editing instructions from the user. On a challenging realistic image-editing dataset, SPICE quantitatively outperforms state-of-the-art baselines and is consistently preferred by human annotators. We release the workflow implementation for popular diffusion model Web UIs to support further research and artistic exploration.
翻译:基于提示的模型在图像编辑任务中已展现出卓越的提示跟随能力。然而,这些模型在处理详细编辑指令或执行局部编辑时仍面临困难。具体而言,全局图像质量常在单次编辑步骤后立即下降。为应对这些挑战,我们提出了SPICE——一种无需训练的工作流,它可接受任意分辨率与宽高比,精确遵循用户需求,在超过100次编辑步骤中持续提升图像质量,同时保持未编辑区域完好无损。通过协同利用基础扩散模型与Canny边缘ControlNet模型的优势,SPICE能够稳健处理用户提供的自由形式编辑指令。在具有挑战性的真实图像编辑数据集上,SPICE在定量评估中超越现有先进基线方法,并持续获得人工标注者的青睐。我们已发布适用于主流扩散模型Web界面的工作流实现,以支持进一步的研究与艺术探索。