NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel Transformer-based motion generation model, namely MARIONET, which can generate diverse motions given action tags. Different from existing motion generation models, MARIONET utilizes contextual information from the past motion clip and future action tag, dedicated to generating actions that can smoothly blend historical and future actions. Specifically, MARIONET first encodes target action tag and contextual information into an action-level latent code. The code is unfolded into frame-level control signals via a time unrolling module, which could be then combined with other frame-level control signals like the target trajectory. Motion frames are then generated in an auto-regressive way. By sequentially applying MARIONET, the system NEURAL MARIONETTE can robustly generate long-term, multi-action motions with the help of two simple schemes, namely "Shadow Start" and "Action Revision". Along with the novel system, we also present a new dataset dedicated to the multi-action motion synthesis task, which contains both action tags and their contextual information. Extensive experiments are conducted to study the action accuracy, naturalism, and transition smoothness of the motions generated by our system.

翻译：我们提出了一种基于神经网络的系统，用于长期、多动作的人体运动合成。该系统名为NEURAL MARIONETTE，能够根据简单的用户输入（包括一系列带有预期动作持续时间的动作标签，以及用户可选的手绘移动轨迹）生成高质量、有意义且过渡平滑的运动。我们系统的核心是一个新颖的基于Transformer的运动生成模型——MARIONET，它能够根据动作标签生成多样化的运动。与现有的运动生成模型不同，MARIONET利用来自过去运动片段和未来动作标签的上下文信息，专门生成能够平滑融合历史与未来动作的动作。具体来说，MARIONET首先将目标动作标签和上下文信息编码为动作级潜在代码，然后通过时间展开模块将其展开为帧级控制信号，这些信号可与目标轨迹等其他帧级控制信号相结合。随后以自回归方式生成运动帧。通过顺序应用MARIONET，结合“阴影启动”和“动作修正”两种简单方案，NEURAL MARIONETTE系统能够稳健地生成长时间、多动作的运动。除了这一新颖系统，我们还提出了一个专为多动作运动合成任务设计的新数据集，其中包含动作标签及其上下文信息。我们进行了大量实验，以研究所生成运动的动作准确性、自然度以及过渡平滑性。