We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion model, not restricted to a specific kinematic structure or object category. To this end, we start from a pre-trained image generator and fine-tune it on a new synthetic dataset, Drag-a-Move, which we introduce. Combined with a new encoding for the drags and dataset randomization, the new model generalizes well to real images and different categories. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
翻译:我们提出DragAPart方法,该方法以图像和一组拖拽操作为输入,可生成与拖拽操作效果一致的同物体新状态图像。与以往聚焦于物体重定位的生成模型不同,DragAPart能够预测部件级交互行为(如抽屉的开合)。我们将该问题作为通用运动模型学习(不限于特定运动结构或物体类别)的代理任务进行研究。为此,我们以预训练图像生成器为基础,在自建的新型合成数据集Drag-a-Move上进行微调。结合所提出的新型拖拽编码方案与数据集随机化策略,该模型能够良好泛化至真实图像及不同类别。与现有的运动控制生成模型相比,我们的方法展现出显著更优的部件级运动理解能力。