While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive benchmark designed to rigorously assess next-level intelligent unified video generation. Extensive experiments demonstrate that OmniWeaving achieves SoTA performance among open-source unified models. The codes and model have already been publicly available. Project Page: https://omniweaving.github.io.
翻译:尽管Seedance-2.0等专有系统在全能视频生成领域取得了显著成功,但开源替代方案仍明显落后。大多数学术模型高度碎片化,而现有少数面向统一视频生成的努力仍难以在单一框架内无缝融合多样化任务。为弥合这一差距,我们提出OmniWeaving——一种具备强大多模态组合与推理能力驱动的全层级视频生成模型。通过利用涵盖多样化组合与推理增强场景的大规模预训练数据集,OmniWeaving学习将交替的文本、多图像与视频输入进行时序绑定,同时充当智能体推断复杂用户意图以完成精细视频创作。此外,我们引入IntelligentVBench——首个专为严格评估下一代智能统一视频生成而设计的综合基准。大量实验表明,OmniWeaving在开源统一模型中达到当前最优性能。代码与模型已公开开源。项目页面:https://omniweaving.github.io。