SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu,Zhixing Zhang,Yanyu Li,Yanwu Xu,Anil Kag,Yang Sui,Huseyin Coskun,Ke Ma,Aleksei Lebedev,Ju Hu,Dimitris Metaxas,Yanzhi Wang,Sergey Tulyakov,Jian Ren

from arxiv, https://snap-research.github.io/snapgen-v/

We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts. However, as a supertask of image generation, video generation models require more computation and are thus hosted mostly on cloud servers, limiting broader adoption among content creators. In this work, we propose a comprehensive acceleration framework to bring the power of the large-scale video diffusion model to the hands of edge users. From the network architecture scope, we initialize from a compact image backbone and search out the design and arrangement of temporal layers to maximize hardware efficiency. In addition, we propose a dedicated adversarial fine-tuning algorithm for our efficient model and reduce the denoising steps to 4. Our model, with only 0.6B parameters, can generate a 5-second video on an iPhone 16 PM within 5 seconds. Compared to server-side models that take minutes on powerful GPUs to generate a single video, we accelerate the generation by magnitudes while delivering on-par quality.

翻译：过去一年，我们见证了基于扩散的视频生成模型取得的空前成功。近期学界提出的模型已具备根据任意输入提示生成具有流畅运动的高分辨率电影级视频的能力。然而，作为图像生成的超任务，视频生成模型需要更多计算资源，因此目前主要部署在云端服务器上，这限制了其在内容创作者中的广泛采用。在本工作中，我们提出了一个全面的加速框架，旨在将大规模视频扩散模型的能力带给边缘用户。从网络架构层面，我们从一个紧凑的图像主干网络初始化，并通过搜索时间层的设计与排列来最大化硬件效率。此外，我们为高效模型提出了一种专用的对抗性微调算法，并将去噪步骤减少至4步。我们的模型仅包含0.6B参数，可在iPhone 16 PM上于5秒内生成一段5秒视频。与需要在强大GPU上花费数分钟才能生成单个视频的服务器端模型相比，我们在保持同等质量的同时，将生成速度提升了数个数量级。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日