Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
翻译:过去一年中,视频生成模型取得了显著进展。AI视频质量持续提升,但代价是模型规模增大、数据量增加以及对训练算力的需求更高。本报告介绍了Open-Sora 2.0——一个仅以20万美元成本训练的商业级视频生成模型。通过该模型,我们证明训练顶级视频生成模型的成本是高度可控的。我们详细阐述了实现这一效率突破的所有技术,包括数据筛选、模型架构、训练策略与系统优化。根据人工评估结果与VBench评分,Open-Sora 2.0的性能可与全球领先的视频生成模型相媲美,包括开源的HunyuanVideo与闭源的Runway Gen-3 Alpha。通过将Open-Sora 2.0完全开源,我们旨在普及先进视频生成技术的使用,促进内容创作领域更广泛的创新与创造力。所有资源已公开于:https://github.com/hpcaitech/Open-Sora。