Recently, there has been a growing interest in predicting human motion, which involves forecasting future body poses based on observed pose sequences. This task is complex due to modeling spatial and temporal relationships. The most commonly used models for this task are autoregressive models, such as recurrent neural networks (RNNs) or variants, and Transformer Networks. However, RNNs have several drawbacks, such as vanishing or exploding gradients. Other researchers have attempted to solve the communication problem in the spatial dimension by integrating Graph Convolutional Networks (GCN) and Long Short-Term Memory (LSTM) models. These works deal with temporal and spatial information separately, which limits the effectiveness. To fix this problem, we propose a novel approach called the multi-graph convolution network (MGCN) for 3D human pose forecasting. This model simultaneously captures spatial and temporal information by introducing an augmented graph for pose sequences. Multiple frames give multiple parts, joined together in a single graph instance. Furthermore, we also explore the influence of natural structure and sequence-aware attention to our model. In our experimental evaluation of the large-scale benchmark datasets, Human3.6M, AMSS and 3DPW, MGCN outperforms the state-of-the-art in pose prediction.
翻译:近年来,人体运动预测引起了广泛关注,该任务旨在根据观测到的姿态序列预测未来的身体姿态。由于需要建模空间和时间关系,这一任务具有复杂性。该任务最常用的模型是自回归模型(如循环神经网络或其变体)以及Transformer网络。然而,循环神经网络存在梯度消失或爆炸等缺陷。其他研究者尝试通过整合图卷积网络和长短期记忆模型来解决空间维度中的通信问题。但这些方法分别处理时间和空间信息,限制了其有效性。为解决这一问题,我们提出了一种名为多图卷积网络的新方法,用于3D人体姿态预测。该模型通过为姿态序列引入增强图,同时捕捉空间和时间信息。多个帧产生多个部分,这些部分在一个图实例中连接在一起。此外,我们还探讨了自然结构和序列感知注意力对模型的影响。在大规模基准数据集Human3.6M、AMSS和3DPW上的实验评估中,MGCN在姿态预测方面优于现有最先进方法。