Image inpainting aims to fill in the missing pixels with visually coherent and semantically plausible content. Despite the great progress brought from deep generative models, this task still suffers from i. the difficulties in large-scale realistic data collection and costly model training; and ii. the intrinsic limitations in the traditionally user-defined binary masks on objects with unclear boundaries or transparent texture. In this paper, we propose MagicRemover, a tuning-free method that leverages the powerful diffusion models for text-guided image inpainting. We introduce an attention guidance strategy to constrain the sampling process of diffusion models, enabling the erasing of instructed areas and the restoration of occluded content. We further propose a classifier optimization algorithm to facilitate the denoising stability within less sampling steps. Extensive comparisons are conducted among our MagicRemover and state-of-the-art methods including quantitative evaluation and user study, demonstrating the significant improvement of MagicRemover on high-quality image inpainting. We will release our code at https://github.com/exisas/Magicremover.
翻译:图像修复旨在用视觉连贯且语义合理的内容填补缺失像素。尽管深度生成模型带来了巨大进展,该任务仍面临以下挑战:i) 大规模真实数据采集困难且模型训练成本高昂;ii) 传统用户自定义二值掩膜在处理边界模糊或透明纹理物体时存在固有局限性。本文提出MagicRemover——一种利用强大扩散模型实现文本引导图像修复的无调优方法。我们引入注意力引导策略约束扩散模型的采样过程,既能擦除指定区域又能恢复被遮挡内容。进一步提出分类器优化算法,在减少采样步数的同时提升去噪稳定性。通过定量评估与用户研究,将MagicRemover与现有最优方法进行广泛对比,证明其在高质量图像修复方面的显著提升。相关代码将开源至https://github.com/exisas/Magicremover。