Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, in this paper we introduce HD-Painter, a completely training-free approach that accurately follows to prompts and coherently scales to high-resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information and resulting in better text alignment generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches qualitatively and quantitatively, achieving an impressive generation accuracy improvement of 61.4% vs 51.9%. We will make the codes publicly available at: https://github.com/Picsart-AI-Research/HD-Painter
翻译:近年来,文本到图像扩散模型取得空前成功,推动了文本引导图像修复领域的进展,产生了极为逼真且视觉合理的修复结果。然而,当前的文本到图像修复模型仍有显著改进空间,尤其在与用户提示的对齐精度以及高分辨率修复能力方面。为此,本文提出HD-Painter——一种完全无需训练的方法,能够精准遵循提示并连贯地扩展至高分辨率图像修复。我们设计了提示感知内向注意力(PAIntA)层,通过提示信息增强自注意力分数,从而生成更符合文本对齐的结果。为进一步提升提示连贯性,我们引入重新加权注意力分数引导(RASG)机制,将后验采样策略无缝集成到DDIM的通用形式中,以避免潜在特征分布漂移。此外,HD-Painter通过引入专为修复定制的超分辨率技术,可扩展至更大规模,完成高达2K分辨率图像中缺失区域的修复。实验证明,HD-Painter在定性和定量上均超越现有最先进方法,生成精度提升显著(61.4%对比51.9%)。代码将开源至:https://github.com/Picsart-AI-Research/HD-Painter