Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.
翻译:视觉特效(VFX)对数字媒体的表现力至关重要,但其生成仍是生成式人工智能面临的主要挑战。现有方法通常依赖“每个特效对应一个LoRA”的范式,这种方法资源消耗大,且本质上无法泛化到未见过的特效,从而限制了可扩展性和创作能力。为应对这一挑战,我们提出了VFXMaster,首个统一的、基于参考的VFX视频生成框架。它将特效生成重新定义为上下文学习任务,使其能够将参考视频中的多样化动态特效复现到目标内容上。此外,该方法在未见特效类别上展现出显著的泛化能力。具体而言,我们设计了一种上下文条件策略,通过参考示例对模型进行提示。我们设计了上下文注意力掩码,以精确解耦并注入关键特效属性,使单个统一模型能够掌握特效模仿而不产生信息泄露。同时,我们提出了一种高效的单样本特效适应机制,能够基于用户提供的单个视频快速提升对困难未见特效的泛化能力。大量实验表明,我们的方法能有效模仿多类特效信息,并在域外特效上表现出卓越的泛化性能。为促进未来研究,我们将向社区公开代码、模型及完整数据集。