Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfactory dragging outcomes. To tackle these issues, we build a stable and precise drag-based editing framework, coined as StableDrag, by designing a discirminative point tracking method and a confidence-based latent enhancement strategy for motion supervision. The former allows us to precisely locate the updated handle points, thereby boosting the stability of long-range manipulation, while the latter is responsible for guaranteeing the optimized latent as high-quality as possible across all the manipulation steps. Thanks to these unique designs, we instantiate two types of image editing models including StableDrag-GAN and StableDrag-Diff, which attains more stable dragging performance, through extensive qualitative experiments and quantitative assessment on DragBench.
翻译:自DragGAN出现以来,基于点的图像编辑方法引起了广泛关注。近期,DragDiffusion通过将该拖动技术适配至扩散模型,进一步提升了生成质量。然而,此类拖动方案存在两大主要缺陷——不精确的点跟踪与不完整的运动监督,这可能导致拖动效果不尽人意。为解决这些问题,我们通过设计判别性点跟踪方法和基于置信度的潜在增强运动监督策略,构建了稳定精密的拖动编辑框架StableDrag。前者能够精确定位更新后的控制点,从而增强长距离操控的稳定性;后者则负责在所有操控步骤中确保优化后的潜在表示尽可能保持高质量。凭借这些独特设计,我们实例化了StableDrag-GAN与StableDrag-Diff两种图像编辑模型,通过在DragBench基准上的大量定性实验与定量评估,验证了其更稳定的拖动性能。