We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.
翻译:我们提出DreamPose,一种基于扩散模型的静态图像生成时尚动画视频方法。给定一张图像和一系列人体姿态序列,该方法可合成同时包含人体与织物运动的视频。为实现此目标,我们通过新型微调策略、支持附加条件信号的一系列架构修改以及促进时序一致性的技术,将预训练的文本到图像模型(稳定扩散)转化为姿态与图像引导的视频合成模型。我们在UBC Fashion数据集中的时尚视频集合上进行微调,并针对多种服装风格与姿态评估了所提方法。实验证明,该方法在时尚视频动画领域取得了最先进成果。视频结果详见项目页面。