Text-to-motion diffusion models can generate realistic animations from text prompts, but do not support fine-grained motion editing controls. In this paper we present a method for using natural language to iteratively specify local edits to existing character animations, a task that is common in most computer animation workflows. Our key idea is to represent a space of motion edits using a set of kinematic motion operators that have well-defined semantics for how to modify specific frames of a target motion. We provide an algorithm that leverages pre-existing language models to translate textual descriptions of motion edits to sequences of motion editing operators (MEOs). Given new keyframes produced by the MEOs, we use diffusion-based keyframe interpolation to generate final motions. Through a user study and quantitative evaluation, we demonstrate that our system can perform motion edits that respect the animator's editing intent, remain faithful to the original animation (they edit the original animation, not dramatically change it), and yield realistic character animation results.
翻译:文本到动作扩散模型能够根据文本提示生成逼真的动画,但无法支持细粒度的动作编辑控制。本文提出一种方法,通过自然语言对现有角色动画进行局部迭代式编辑——这一任务是大多数计算机动画工作流程中的常见需求。我们的核心思路是采用一组具有明确语义的运动学操作算子(MEOs)来表示动作编辑空间,这些算子定义了如何修改目标动作的特定帧。我们提出一种算法,利用预训练语言模型将动作编辑的文本描述转换为动作编辑操作符序列。通过操作符生成的新关键帧,我们采用基于扩散模型的关键帧插值方法生成最终动画。通过用户研究与定量评估,我们证明该系统能够执行符合动画师编辑意图、保持原始动画忠实度(仅编辑而非剧烈改变原动画)、并产生逼真角色动画结果的动作编辑。