Pix2Gif: Motion-Guided Diffusion for GIF Generation

We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation. We tackle this problem differently by formulating the task as an image translation problem steered by text and motion magnitude prompts, as shown in teaser fig. To ensure that the model adheres to motion guidance, we propose a new motion-guided warping module to spatially transform the features of the source image conditioned on the two types of prompts. Furthermore, we introduce a perceptual loss to ensure the transformed feature map remains within the same space as the target image, ensuring content consistency and coherence. In preparation for the model training, we meticulously curated data by extracting coherent image frames from the TGIF video-caption dataset, which provides rich information about the temporal changes of subjects. After pretraining, we apply our model in a zero-shot manner to a number of video datasets. Extensive qualitative and quantitative experiments demonstrate the effectiveness of our model -- it not only captures the semantic prompt from text but also the spatial ones from motion guidance. We train all our models using a single node of 16xV100 GPUs. Code, dataset and models are made public at: https://hiteshk03.github.io/Pix2Gif/.

翻译：我们提出Pix2Gif，一种用于图像到GIF（视频）生成的运动引导扩散模型。我们通过将任务表述为由文本和运动幅度提示引导的图像翻译问题来差异化地解决该问题（如提要图所示）。为确保模型遵循运动引导，我们提出一种新的运动引导扭曲模块，用于根据两种类型的提示对源图像特征进行空间变换。此外，我们引入感知损失，确保变换后的特征图与目标图像保持在同一空间内，从而保证内容一致性与连贯性。在模型训练准备阶段，我们通过从TGIF视频-字幕数据集中提取连贯图像帧精心整理数据，该数据集提供了关于主体时间变化的丰富信息。预训练后，我们以零样本方式将模型应用于多个视频数据集。大量定性与定量实验证明了模型的有效性——它不仅捕捉文本中的语义提示，还能捕捉运动引导中的空间提示。所有模型均使用单个16×V100 GPU节点训练。代码、数据集和模型已公开于：https://hiteshk03.github.io/Pix2Gif/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日