Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (\emph{\textcolor{magenta}{i}}) point-based drag is often highly ambiguous and difficult to align with users' intentions; (\emph{\textcolor{magenta}{ii}}) current DBIE methods primarily rely on alternating between motion supervision and point tracking, which is not only cumbersome but also fails to produce high-quality results. These limitations motivate us to explore DBIE from a new perspective -- redefining it as deformation, rotation, and translation of user-specified handle regions. Thereby, by requiring users to explicitly specify both drag areas and types, we can effectively address the ambiguity issue. Furthermore, we propose a simple-yet-effective editing framework, dubbed \textcolor{SkyBlue}{\textbf{DragNeXt}}. It unifies DBIE as a Latent Region Optimization (LRO) problem and solves it through Progressive Backward Self-Intervention (PBSI), simplifying the overall procedure of DBIE while further enhancing quality by fully leveraging region-level structure information and progressive guidance from intermediate drag states. We validate \textcolor{SkyBlue}{\textbf{DragNeXt}} on our NextBench, and extensive experiments demonstrate that our proposed method can significantly outperform existing approaches. Code will be released on github.
翻译:基于拖拽的图像编辑(DBIE)允许用户通过直接在图像内拖拽对象来操控图像,近来引起了学术界的广泛关注。然而,它面临两个关键挑战:(\emph{\textcolor{magenta}{i}})基于点的拖拽通常具有高度模糊性,难以与用户意图对齐;(\emph{\textcolor{magenta}{ii}})当前的DBIE方法主要依赖于运动监督与点跟踪的交替执行,这不仅过程繁琐,而且难以产生高质量结果。这些局限性促使我们从新的视角探索DBIE——将其重新定义为用户指定手柄区域的形变、旋转和平移。由此,通过要求用户明确指定拖拽区域和类型,我们能有效解决模糊性问题。此外,我们提出了一种简洁而有效的编辑框架,命名为\textcolor{SkyBlue}{\textbf{DragNeXt}}。它将DBIE统一为潜在区域优化(LRO)问题,并通过渐进式后向自干预(PBSI)进行求解,从而简化了DBIE的整体流程,同时通过充分利用区域级结构信息以及来自中间拖拽状态的渐进式指导,进一步提升了编辑质量。我们在NextBench基准上验证了\textcolor{SkyBlue}{\textbf{DragNeXt}},大量实验表明,我们提出的方法能够显著超越现有方法。代码将在github上发布。