Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.
翻译:多人运动预测是一个具有挑战性的问题,因为运动既依赖于个体过去的运动轨迹,也依赖于与人之间的交互。基于Transformer的方法在此任务上展现出良好前景,但忽略了关节间的显式关系表示(如骨骼结构和成对距离),而这对于精确的交互建模至关重要。本文提出联合关系Transformer,利用关系信息增强交互建模并改进未来运动预测。我们的关系信息包含相对距离和个体内/个体间物理约束。为融合关系与关节信息,我们设计了一种新型的关节-关系融合层,通过关系感知注意力同时更新两种特征。此外,我们通过预测未来距离对关系信息进行监督。实验表明,我们的方法在3DPW-SoMoF/RC数据集上将900ms VIM提升13.4%,在CMU-Mpcap/MuPoTS-3D数据集上将3秒MPJPE分别提升17.8%和12.0%。