The fidelity and structural diversity of training datasets fundamentally determine the capabilities of video generation models. While commercial systems showremarkableabilitytogeneratecinematicnarratives, the progress of open-source models remains limited by the scarcity of high-quality training data. To bridge this gap, we introduce CineDance-1M, a large-scale, open research Text-to-Audio-Video (T2AV) dataset designed specifically for multi-shot, long-form joint audio-video generation. Averaging 92.8 seconds and 24.2 continuous shots per video, it provides configurable, structured annotations for both audio and video modalities. This exceptional quality is achieved through a rigorous three-stage curation pipeline: i) diverse sourcing and comprehensive cleansing, ii) film-theory-inspired narrative parsing, and iii) hierarchical dual-modal captioning. For a comprehensive assessment, we propose CineBench, featuring a diverse prompt suite and a six-dimensional, human-aligned metric system tailored for complex narrative audio-video evaluation. Furthermore, we adapt LTX-2.3 into CineDance, which demonstrates exceptional single-modality quality alongside precise audio-video alignment and robust subject and environment consistency, effectively validating our curation strategy and the high quality of CineDance-1M. We anticipate that this work will serve as a solid foundation for accelerating future research in multi-shot, long-form joint audio-video generation. Our project page is available at https://aliothchen.github.io/projects/CineDance/.
翻译:训练数据的保真度和结构多样性从根本上决定了视频生成模型的性能。尽管商业系统在生成电影化叙事方面表现出非凡能力,但开源模型的进展仍受限于高质量训练数据的稀缺。为弥合这一差距,我们提出了CineDance-1M——一个专为多镜头、长篇幅音视频联合生成设计的大规模开放研究文本到音视频(T2AV)数据集。每个视频平均时长92.8秒,包含24.2个连续镜头,为音频和视频模态提供了可配置的结构化标注。这一卓越质量通过严格的三个阶段数据整理流程实现:(i) 多样化数据源采集与全面清洗,(ii) 基于电影理论的叙事解析,以及(iii) 分层双模态字幕生成。为进行全面评估,我们提出了CineBench,其包含多样化提示词套件和一套面向复杂叙事音视频评估的六维度人工对齐指标系统。此外,我们将LTX-2.3适配为CineDance,它在实现精确音视频对齐以及稳健的主体与环境一致性的同时,展现出卓越的单模态质量,有效验证了我们的数据整理策略和CineDance-1M的高质量。我们预期这项工作将为加速多镜头、长篇幅音视频联合生成的未来研究奠定坚实基础。项目页面见 https://aliothchen.github.io/projects/CineDance/。