Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Our results on ReS demonstrate the quality of repositioned image generation.
翻译:当前的图像处理主要集中于静态操作,例如替换图像中的特定区域或改变整体风格。本文提出了一项创新的动态操作任务——主体重新定位。该任务旨在将用户指定的主体移动到目标位置,同时保持图像的真实性。研究表明,主体重新定位的基本子任务,包括填补主体移位后的空白区域、重建主体被遮挡部分以及将主体与周围区域进行一致性融合,可以有效地统一重构为提示引导的图像修补任务。因此,我们能够利用单个扩散生成模型,通过所提出的任务反演技术学习到的不同任务提示来解决这些子任务。此外,我们整合了预处理和后处理技术以进一步提升主体重新定位的质量。这些要素共同构成了我们的SEgment-gEnerate-and-bLEnd(SEELE)框架。为评估SEELE在主体重新定位中的有效性,我们构建了一个名为ReS的真实场景主体重新定位数据集。在ReS上的实验结果表明了重新定位图像生成的质量。