A new method is proposed for human motion prediction by learning temporal and spatial dependencies. Recently, multiscale graphs have been developed to model the human body at higher abstraction levels, resulting in more stable motion prediction. Current methods however predetermine scale levels and combine spatially proximal joints to generate coarser scales based on human priors, even though movement patterns in different motion sequences vary and do not fully comply with a fixed graph of spatially connected joints. Another problem with graph convolutional methods is mode collapse, in which predicted poses converge around a mean pose with no discernible movements, particularly in long-term predictions. To tackle these issues, we propose ResChunk, an end-to-end network which explores dynamically correlated body components based on the pairwise relationships between all joints in individual sequences. ResChunk is trained to learn the residuals between target sequence chunks in an autoregressive manner to enforce the temporal connectivities between consecutive chunks. It is hence a sequence-to-sequence prediction network which considers dynamic spatio-temporal features of sequences at multiple levels. Our experiments on two challenging benchmark datasets, CMU Mocap and Human3.6M, demonstrate that our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques to set a new state-of-the-art. Our code is available at https://github.com/MohsenZand/ResChunk.
翻译:本文提出了一种通过捕获时间与空间依赖关系实现人体运动预测的新方法。近年来,多尺度图被用于在更高抽象层级对人体进行建模,从而获得更稳定的运动预测。然而,当前方法预先设定尺度层级,并基于人类先验知识将空间相邻关节组合以生成更粗糙的尺度,但不同运动序列中的运动模式存在差异,且并非完全遵循固定的空间连接关节图结构。图卷积方法面临的另一问题是模态崩塌,即预测姿态收敛于均值姿态,缺乏可辨识的运动特征,尤其在长期预测中。针对上述问题,我们提出ResChunk——一种端到端网络,通过分析单个序列中所有关节间的成对关系,探索动态关联的身体组件。ResChunk通过自回归方式学习目标序列块之间的残差,从而强化相邻块间的时间连接。该网络本质上是一个序列到序列的预测模型,能够在多个层级上考虑序列的动态时空特征。我们在两个具有挑战性的基准数据集(CMU Mocap和Human3.6M)上的实验表明,所提方法能够有效建模运动预测的序列信息,并超越现有技术,达到新的最先进水平。我们的代码已开源至https://github.com/MohsenZand/ResChunk。