Synthesizing controllable motion for a character using deep learning has been a promising approach due to its potential to learn a compact model without laborious feature engineering. To produce dynamic motion from weak control signals such as desired paths, existing methods often require auxiliary information such as phases for alleviating motion ambiguity, which limits their generalisation capability. As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase. Specifically, an encoder is devised to adaptively formulate the motion patterns of a character's past poses with multi-scale skeletons, and a decoder driven by control signals to further synthesize and predict the character's state by paying context-specialised attention to the encoded past motion patterns. As a result, it helps alleviate the issues of low responsiveness and slow transition which often happen in conventional methods not using auxiliary information. Both qualitative and quantitative experimental results on an existing biped locomotion dataset, which involves diverse types of motion transitions, demonstrate the effectiveness of our method. In particular, MCS-T is able to successfully generate motions comparable to those generated by the methods using auxiliary information.
翻译:利用深度学习合成可控角色运动是一种极具前景的方法,因为它能够学习紧凑的模型而无需繁琐的特征工程。为了从弱控制信号(如期望路径)生成动态运动,现有方法通常需要相位等辅助信息来缓解运动模糊性,这限制了其泛化能力。由于历史姿态通常包含有用的辅助线索,本文提出了一种任务无关的深度学习方法——多尺度控制信号感知变换器(MCS-T),它采用基于注意力机制的编码器-解码器架构,隐式地发现辅助信息,从而无需显式地利用相位等辅助信息即可合成可控运动。具体而言,编码器被设计为通过多尺度骨骼自适应地构建角色历史姿态的运动模式,而由控制信号驱动的解码器则通过对编码后的历史运动模式施加上下文专门注意力,进一步合成并预测角色状态。这有助于缓解传统未使用辅助信息的方法中常出现的响应迟钝和过渡缓慢问题。在包含多种运动过渡类型的现有双足运动数据集上的定性和定量实验结果均证明了我们方法的有效性。特别地,MCS-T能够成功生成与使用辅助信息的方法相媲美的运动序列。