3D multi-person motion prediction is a challenging task that involves modeling individual behaviors and interactions between people. Despite the emergence of approaches for this task, comparing them is difficult due to the lack of standardized training settings and benchmark datasets. In this paper, we introduce the Multi-Person Interaction Motion (MI-Motion) Dataset, which includes skeleton sequences of multiple individuals collected by motion capture systems and refined and synthesized using a game engine. The dataset contains 167k frames of interacting people's skeleton poses and is categorized into 5 different activity scenes. To facilitate research in multi-person motion prediction, we also provide benchmarks to evaluate the performance of prediction methods in three settings: short-term, long-term, and ultra-long-term prediction. Additionally, we introduce a novel baseline approach that leverages graph and temporal convolutional networks, which has demonstrated competitive results in multi-person motion prediction. We believe that the proposed MI-Motion benchmark dataset and baseline will facilitate future research in this area, ultimately leading to better understanding and modeling of multi-person interactions.
翻译:三维多人运动预测是一项具有挑战性的任务,涉及个体行为建模以及人与人之间的交互。尽管已有多种方法针对该任务提出,但由于缺乏标准化的训练设置和基准数据集,难以对不同方法进行公平比较。本文提出了多人交互运动(MI-Motion)数据集,该数据集包含由运动捕捉系统采集、经游戏引擎精炼与合成的多人骨骼序列。数据集涵盖16.7万帧交互人物骨骼姿态,分为5种不同的活动场景。为促进多人运动预测研究,我们同时提供了基准测试集,用于评估短时、长时及超长时三种预测场景下的方法性能。此外,本文提出了一种基于图卷积网络与时序卷积网络的新型基线方法,该方法在多人运动预测中展现了具有竞争力的结果。我们相信,所提出的MI-Motion基准数据集与基线方法将推动该领域未来研究,最终促进对多人交互的更深入理解与建模。