Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, in this paper we introduce HD-Painter, a completely training-free approach that accurately follows to prompts and coherently scales to high-resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information and resulting in better text alignment generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches qualitatively and quantitatively, achieving an impressive generation accuracy improvement of 61.4% vs 51.9%. We will make the codes publicly available at: https://github.com/Picsart-AI-Research/HD-Painter
翻译:摘要:基于文本到图像扩散模型前所未有的成功,文本引导图像修复领域的最新进展已取得极其逼真且视觉上合理的成果。然而,当前文本到图像修复模型仍有显著的改进空间,特别是在使修复区域与用户提示更好对齐以及执行高分辨率修复方面。为此,本文提出HD-Painter,一种完全无需训练的修复方法,能够精确遵循提示并连贯地扩展至高分辨率图像修复。为实现这一目标,我们设计了提示感知内向注意力层,通过提示信息增强自注意力分数,从而生成更符合文本对齐的结果。为进一步提升提示连贯性,我们引入重加权注意力分数引导机制,将后验采样策略无缝整合到通用形式DDIM中,防止出现分布外潜在偏移。此外,HD-Painter通过引入专为修复定制的超分辨率技术,支持扩展到更大尺度,可完成高达2K分辨率图像的缺失区域填充。实验表明,HD-Painter在定性和定量层面均超越现有最先进方法,实现了61.4%(相较51.9%)的显著生成精度提升。代码将公开于:https://github.com/Picsart-AI-Research/HD-Painter