Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.
翻译:视频生成领域的最新进展已显著影响个人与行业的日常生活。然而,领先的视频生成模型仍保持闭源状态,导致工业界能力与公众可用技术之间存在显著性能差距。本报告介绍了混元视频(HunyuanVideo),一种创新的开源视频基础模型,其在视频生成方面的性能可与领先的闭源模型相媲美,甚至有所超越。混元视频涵盖了一个综合性框架,整合了多个关键要素,包括数据策展、先进的架构设计、渐进式模型扩展与训练,以及专为大规模模型训练和推理定制的高效基础设施。因此,我们成功训练了一个参数超过130亿的视频生成模型,使其成为所有开源模型中规模最大的。我们进行了广泛的实验,并实施了一系列针对性设计,以确保高视觉质量、运动动态、文本-视频对齐以及先进的拍摄技术。根据专业评估,混元视频在性能上超越了先前的先进模型,包括Runway Gen-3、Luma 1.6以及三种表现最佳的中文视频生成模型。通过发布基础模型及其应用代码,我们旨在弥合闭源与开源社区之间的差距。此举将赋能社区内的个人实践其创意,培育一个更具活力与生机的视频生成生态系统。代码公开于https://github.com/Tencent/HunyuanVideo。