HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong,Qi Tian,Zijian Zhang,Rox Min,Zuozhuo Dai,Jin Zhou,Jiangfeng Xiong,Xin Li,Bo Wu,Jianwei Zhang,Kathrina Wu,Qin Lin,Junkun Yuan,Yanxin Long,Aladdin Wang,Andong Wang,Changlin Li,Duojun Huang,Fang Yang,Hao Tan,Hongmei Wang,Jacob Song,Jiawang Bai,Jianbing Wu,Jinbao Xue,Joey Wang,Kai Wang,Mengyang Liu,Pengyu Li,Shuai Li,Weiyan Wang,Wenqing Yu,Xinchi Deng,Yang Li,Yi Chen,Yutao Cui,Yuanbo Peng,Zhentao Yu,Zhiyu He,Zhiyong Xu,Zixiang Zhou,Zunnan Xu,Yangyu Tao,Qinglin Lu,Songtao Liu,Daquan Zhou,Hongfa Wang,Yong Yang,Di Wang,Yuhong Liu,Jie Jiang,Caesar Zhong

Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.

翻译：视频生成领域的最新进展已显著影响个人与行业的日常生活。然而，领先的视频生成模型仍保持闭源状态，导致工业界能力与公众可用技术之间存在显著性能差距。本报告介绍了混元视频（HunyuanVideo），一种创新的开源视频基础模型，其在视频生成方面的性能可与领先的闭源模型相媲美，甚至有所超越。混元视频涵盖了一个综合性框架，整合了多个关键要素，包括数据策展、先进的架构设计、渐进式模型扩展与训练，以及专为大规模模型训练和推理定制的高效基础设施。因此，我们成功训练了一个参数超过130亿的视频生成模型，使其成为所有开源模型中规模最大的。我们进行了广泛的实验，并实施了一系列针对性设计，以确保高视觉质量、运动动态、文本-视频对齐以及先进的拍摄技术。根据专业评估，混元视频在性能上超越了先前的先进模型，包括Runway Gen-3、Luma 1.6以及三种表现最佳的中文视频生成模型。通过发布基础模型及其应用代码，我们旨在弥合闭源与开源社区之间的差距。此举将赋能社区内的个人实践其创意，培育一个更具活力与生机的视频生成生态系统。代码公开于https://github.com/Tencent/HunyuanVideo。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日