DragNeXt: Rethinking Drag-Based Image Editing

Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (\emph{\textcolor{magenta}{i}}) point-based drag is often highly ambiguous and difficult to align with users' intentions; (\emph{\textcolor{magenta}{ii}}) current DBIE methods primarily rely on alternating between motion supervision and point tracking, which is not only cumbersome but also fails to produce high-quality results. These limitations motivate us to explore DBIE from a new perspective -- redefining it as deformation, rotation, and translation of user-specified handle regions. Thereby, by requiring users to explicitly specify both drag areas and types, we can effectively address the ambiguity issue. Furthermore, we propose a simple-yet-effective editing framework, dubbed \textcolor{SkyBlue}{\textbf{DragNeXt}}. It unifies DBIE as a Latent Region Optimization (LRO) problem and solves it through Progressive Backward Self-Intervention (PBSI), simplifying the overall procedure of DBIE while further enhancing quality by fully leveraging region-level structure information and progressive guidance from intermediate drag states. We validate \textcolor{SkyBlue}{\textbf{DragNeXt}} on our NextBench, and extensive experiments demonstrate that our proposed method can significantly outperform existing approaches. Code will be released on github.

翻译：基于拖拽的图像编辑（DBIE）允许用户通过直接在图像内拖拽对象来操控图像，近来引起了学术界的广泛关注。然而，它面临两个关键挑战：（\emph{\textcolor{magenta}{i}}）基于点的拖拽通常具有高度模糊性，难以与用户意图对齐；（\emph{\textcolor{magenta}{ii}}）当前的DBIE方法主要依赖于运动监督与点跟踪的交替执行，这不仅过程繁琐，而且难以产生高质量结果。这些局限性促使我们从新的视角探索DBIE——将其重新定义为用户指定手柄区域的形变、旋转和平移。由此，通过要求用户明确指定拖拽区域和类型，我们能有效解决模糊性问题。此外，我们提出了一种简洁而有效的编辑框架，命名为\textcolor{SkyBlue}{\textbf{DragNeXt}}。它将DBIE统一为潜在区域优化（LRO）问题，并通过渐进式后向自干预（PBSI）进行求解，从而简化了DBIE的整体流程，同时通过充分利用区域级结构信息以及来自中间拖拽状态的渐进式指导，进一步提升了编辑质量。我们在NextBench基准上验证了\textcolor{SkyBlue}{\textbf{DragNeXt}}，大量实验表明，我们提出的方法能够显著超越现有方法。代码将在github上发布。

相关内容

Magenta

关注 0

Magenta is a Google Brain project to ask and answer the questions, “Can we use machine learning to create compelling art and music? If so, how? If not, why not?”

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

【WWW2025】ImageScope：通过大型多模态模型集体推理统一语言引导的图像检索

专知会员服务

12+阅读 · 2025年4月22日

图增强生成（GraphRAG）

专知会员服务

35+阅读 · 2025年1月4日

复旦最新《基于文本到图像扩散模型的多模态引导图像编辑》综述

专知会员服务

16+阅读 · 2024年6月21日