Achieving high-quality versatile image inpainting, where user-specified regions are filled with plausible content according to user intent, presents a significant challenge. Existing methods face difficulties in simultaneously addressing context-aware image inpainting and text-guided object inpainting due to the distinct optimal training strategies required. To overcome this challenge, we introduce PowerPaint, the first high-quality and versatile inpainting model that excels in both tasks. First, we introduce learnable task prompts along with tailored fine-tuning strategies to guide the model's focus on different inpainting targets explicitly. This enables PowerPaint to accomplish various inpainting tasks by utilizing different task prompts, resulting in state-of-the-art performance. Second, we demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal. Additionally, we leverage prompt interpolation techniques to enable controllable shape-guided object inpainting. Finally, we extensively evaluate PowerPaint on various inpainting benchmarks to demonstrate its superior performance for versatile image inpainting. We release our codes and models on our project page: https://powerpaint.github.io/.
翻译:实现高质量通用图像修复——即根据用户意图在指定区域填充合理内容——是一项重大挑战。现有方法难以同时处理上下文感知图像修复和文本引导的目标修复,这是由于两者所需的训练策略存在显著差异。为克服这一难题,我们提出了PowerPaint,这是首个在两项任务中均表现出色的高质量通用修复模型。首先,我们引入可学习的任务提示及定制化的微调策略,以显式引导模型聚焦于不同的修复目标。这使得PowerPaint能够通过使用不同的任务提示完成多种修复任务,并取得最优性能。其次,我们展示了任务提示在PowerPaint中的通用性,证明其作为负提示在目标移除中的有效性。此外,我们利用提示插值技术实现了可控的形状引导目标修复。最后,我们在多种修复基准上对PowerPaint进行了全面评估,验证了其在通用图像修复中的优越性能。代码与模型已发布于项目页面:https://powerpaint.github.io/。