Automated movie creation requires coordinating multiple characters, modalities, and narrative elements across extended sequences -- a challenge that existing end-to-end approaches struggle to address effectively. We present \textbf{CineAGI}, a hierarchical movie generation framework that decomposes this complex task through specialized multi-agent orchestration. Our framework employs three key innovations: (1) a multi-agent narrative synthesis module where specialized LLM agents collaboratively generate comprehensive cinematic blueprints with character profiles, scene descriptions, and cross-modal specifications; (2) a decoupled character-centric pipeline that maintains identity consistency through instance-level tracking and integration while enabling flexible multi-character composition; and (3) a hierarchical audio-visual synchronization mechanism ensuring frame-level alignment of dialogue, expressions, and music. Extensive experiments demonstrate that CineAGI achieves 40\% improvement in overall consistency, 4.4\% gain in subject consistency, 5.4\% enhancement in aesthetic quality, and 28.7\% higher character consistency compared to baselines. Our work establishes a principled foundation for automated multi-scene video generation that preserves narrative coherence and character authenticity.
翻译:自动电影创作需要协调多个角色、模态和叙事元素在长序列中的一致性——这一挑战是现有端到端方法难以有效应对的。本文提出\textbf{CineAGI},一种层次化电影生成框架,通过专门的多智能体编排机制分解这一复杂任务。我们的框架包含三项关键创新:(1)多智能体叙事合成模块,其中专用的大语言模型智能体协同生成包含角色档案、场景描述和跨模态规格的综合性电影蓝图;(2)解耦的角色中心化流水线,通过实例级跟踪与集成保持身份一致性,同时支持灵活的多角色组合;(3)层次化视听同步机制,确保对话、表情和音乐在帧级别上的对齐。大量实验表明,与基线方法相比,CineAGI在整体一致性上提升40%,主体一致性增益4.4%,美学质量增强5.4%,角色一致性提高28.7%。本工作为保持叙事连贯性与角色真实性的自动化多场景视频生成建立了原则性基础。