OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.
翻译:OmniLottie是一个多功能框架,能够根据多模态指令生成高质量的矢量动画。为了实现灵活的运动和视觉内容控制,我们聚焦于Lottie——一种用于表示形状和动画行为的轻量级JSON格式。然而,原始的Lottie JSON文件包含大量不变的结构化元数据和格式化令牌,这给学习矢量动画生成带来了重大挑战。因此,我们引入了一个精心设计的Lottie令牌化器,它能将JSON文件转换为表示形状、动画函数和控制参数的命令与参数的结构化序列。该令牌化器使我们能够在预训练的视觉语言模型基础上构建OmniLottie,以遵循多模态交错指令并生成高质量的矢量动画。为了进一步推动矢量动画生成的研究,我们策划了MMLottie-2M——一个大规模的专业设计矢量动画数据集,并配有文本和视觉标注。通过大量实验,我们验证了OmniLottie能够生成生动且语义对齐的矢量动画,这些动画紧密遵循多模态的人类指令。