Wepresent Alice v1, a 14-billion parameter open-source video generation model that achieves state-of-the-art quality through consistency distillation with score regularization (rCM). Contrary to conventional distillation-which trades quality for speed-we demonstrate that rCM-based distillation can exceed teacher model quality. We attribute this to three mechanisms: (1) the score regularization term acts as a mode-seeking objective that concentrates probability mass on high-quality outputs rather than covering the full teacher distribution, (2) our targeted synthetic data pipeline with hard example mining provides training signal specifically for failure modes (physics, hands, faces) that the teacher handles inconsistently, and (3) consistency enforcement acts as implicit regularization, eliminating "lucky path" dependence on specific noise samples. Alice v1 generates 5-second 720p videos at 24fps in 4 denoising steps (~8 seconds on H100), a 7x speedup over the 50-step teacher while improving VBench score from 84.0 (Wan2.2) to 91.2. This surpasses both the teacher and closed-source systems including Veo3 (~90) and Sora2 (~88) on automated benchmarks, with competitive results in human preference studies. We release all model weights, training code, synthetic data pipelines, and evaluation scripts to advance open research in video generation.
翻译:我们提出Alice v1,一个拥有140亿参数的开源视频生成模型,通过带分数正则化的一致性蒸馏(rCM)实现了最先进的质量。与传统蒸馏(以质量换取速度)相反,我们证明了基于rCM的蒸馏能够超越教师模型的质量。我们将此归因于三种机制:(1)分数正则化项作为一种模式寻找目标,将概率质量集中在高质量输出上,而非覆盖完整的教师分布;(2)我们针对性的合成数据管道结合难例挖掘,专门为教师处理不一致的失败模式(物理、手部、面部)提供训练信号;(3)一致性强制作为隐式正则化,消除了对特定噪声样本的“幸运路径”依赖。Alice v1可在4步去噪中生成5秒720p、24fps的视频(H100上约8秒),相比50步的教师模型实现7倍加速,同时将VBench分数从84.0(Wan2.2)提升至91.2。在自动化基准测试上,这超越了教师模型和闭源系统(包括Veo3约90和Sora2约88),并在人类偏好研究中取得具有竞争力的结果。我们将发布所有模型权重、训练代码、合成数据管道和评估脚本,以推动视频生成领域的开放研究。