A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting

Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the "Text-Guided Subject-Position Variable Background Inpainting", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position. Secondly, we design the Reverse Displacement Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse structure, to transform hierarchical feature maps from deep to shallow based on semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed. Extensive comparative experiments validate the effectiveness of our A$^\text{T}$A approach, which not only demonstrates superior inpainting capabilities in subject-position variable inpainting, but also ensures good performance on subject-position fixed inpainting.

翻译：图像修复旨在填充图像的缺失区域。近年来，前景条件化背景修复这一子任务引起了广泛关注，该任务在提供前景主体及相关文本提示的情况下填充图像的背景。现有的背景修复方法通常严格保留主体在源图像中的原始位置，导致主体与生成的背景之间存在不一致。为应对这一挑战，我们提出了一项新任务——“文本引导的主体位置可变背景修复”，其目标是动态调整主体位置以实现主体与修复背景之间的和谐关系，并为此任务提出了自适应变换智能体（A$^\text{T}$A）。首先，我们设计了一个位置智能体模块，该模块能够根据给定特征自适应地预测合适的位移，以实现可变的主体位置。其次，我们设计了反向位移变换模块，该模块以反向结构排列多个位置智能体模块，从而基于语义信息将分层特征图从深层向浅层进行变换。第三，我们为A$^\text{T}$A配备了位置切换嵌入，以控制生成图像中主体的位置是自适应预测还是固定不变。大量的对比实验验证了我们A$^\text{T}$A方法的有效性，该方法不仅在主体位置可变的修复任务中展现出卓越的修复能力，同时也确保了在主体位置固定的修复任务上具有良好的性能。