LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minute per edit) and low success rates. Addressing these issues head on, we present LightningDrag, a rapid approach enabling high quality drag-based image editing in ~1 second. Unlike most previous methods, we redefine drag-based editing as a conditional generation task, eliminating the need for time-consuming latent optimization or gradient-based guidance during inference. In addition, the design of our pipeline allows us to train our model on large-scale paired video frames, which contain rich motion information such as object translations, changing poses and orientations, zooming in and out, etc. By learning from videos, our approach can significantly outperform previous methods in terms of accuracy and consistency. Despite being trained solely on videos, our model generalizes well to perform local shape deformations not presented in the training data (e.g., lengthening of hair, twisting rainbows, etc.). Extensive qualitative and quantitative evaluations on benchmark datasets corroborate the superiority of our approach. The code and model will be released at https://github.com/magic-research/LightningDrag.

翻译：在图像编辑任务中，准确性和速度至关重要。Pan等人引入了一种基于拖拽的图像编辑框架，该框架利用生成对抗网络实现了像素级控制。随后涌现的大量研究通过利用大规模扩散模型增强了该框架的通用性。然而，这些方法通常存在处理时间过长（每次编辑超过1分钟）和成功率低的问题。针对这些问题，我们提出了LightningDrag，一种能够在约1秒内实现高质量拖拽式图像编辑的快速方法。与大多数先前方法不同，我们将拖拽式编辑重新定义为条件生成任务，从而在推理过程中无需耗时的潜在优化或基于梯度的引导。此外，我们的流程设计允许我们在包含丰富运动信息（如物体平移、姿态与方向变化、缩放等）的大规模成对视频帧上训练模型。通过从视频中学习，我们的方法在准确性和一致性方面显著优于先前方法。尽管仅在视频数据上训练，我们的模型能够很好地泛化，以执行训练数据中未出现的局部形状变形（例如，拉长头发、扭曲彩虹等）。在基准数据集上进行的大量定性和定量评估证实了我们方法的优越性。代码和模型将在 https://github.com/magic-research/LightningDrag 发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日