Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e.,image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint-image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models will be made publicly available.

翻译：生成可文本编辑且姿势可控的角色视频在创建多种数字人方面具有迫切需求。然而，由于缺乏包含配对视频-姿态标注的综合数据集以及视频生成先验模型，该任务一直受到限制。在本工作中，我们设计了一种新颖的两阶段训练方案，能够利用易于获取的数据集（即图像-姿态对和无姿态视频）以及预训练的文本到图像（T2I）模型来生成姿势可控的角色视频。具体而言，在第一阶段，仅使用关键点-图像对用于可控的文本到图像生成。我们学习一个零初始化的卷积编码器来编码姿态信息。在第二阶段，我们通过添加可学习的时序自注意力模块和重构的跨帧自注意力模块，在无姿态视频数据集上微调上述网络的运动部分。得益于我们的新设计，该方法成功生成了连续姿势可控的角色视频，同时保留了预训练T2I模型的编辑和概念组合能力。相关代码和模型将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日