Interactive point-based image editing serves as a controllable editor, enabling precise and flexible manipulation of image content. However, most drag-based methods operate primarily on the 2D pixel plane with limited use of 3D cues. As a result, they often produce imprecise and inconsistent edits, particularly in geometry-intensive scenarios such as rotations and perspective transformations. To address these limitations, we propose a novel geometry-guided drag-based image editing method-GeoDrag, which addresses three key challenges: 1) incorporating 3D geometric cues into pixel-level editing, 2) mitigating discontinuities caused by geometry-only guidance, and 3) resolving conflicts arising from multi-point dragging. Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing in a single forward pass. In addition, a conflict-free partitioning strategy is introduced to isolate editing regions, effectively preventing interference and ensuring consistency. Extensive experiments across various editing scenarios validate the effectiveness of our method, showing superior precision, structural consistency, and reliable multi-point editability. Project page: https://xinyu-pu.github.io/projects/geodrag.
翻译:交互式基于点的图像编辑作为一种可控编辑器,能够对图像内容进行精确且灵活的操控。然而,大多数基于拖拽的方法主要在二维像素平面上操作,对三维线索的利用有限。因此,它们常常产生不精确且不一致的编辑结果,尤其是在旋转和透视变换等几何密集的场景中。为了解决这些局限性,我们提出了一种新颖的几何引导的基于拖拽的图像编辑方法——GeoDrag。该方法解决了三个关键挑战:1)将三维几何线索融入像素级编辑;2)缓解仅由几何引导引起的连续性中断;3)解决多点拖拽产生的冲突。GeoDrag建立在一个统一的位移场上,该场联合编码了三维几何和二维空间先验,从而能够在单次前向传播中实现连贯、高保真且结构一致的编辑。此外,我们引入了一种无冲突分区策略来隔离编辑区域,有效防止干扰并确保一致性。在各种编辑场景下的大量实验验证了我们方法的有效性,显示出其在精确性、结构一致性和可靠的多点可编辑性方面的优越性。项目页面:https://xinyu-pu.github.io/projects/geodrag。