3D multi-person motion prediction is a challenging task that involves modeling individual behaviors and interactions between people. Despite the emergence of approaches for this task, comparing them is difficult due to the lack of standardized training settings and benchmark datasets. In this paper, we introduce the Multi-Person Interaction Motion (MI-Motion) Dataset, which includes skeleton sequences of multiple individuals collected by motion capture systems and refined and synthesized using a game engine. The dataset contains 167k frames of interacting people's skeleton poses and is categorized into 5 different activity scenes. To facilitate research in multi-person motion prediction, we also provide benchmarks to evaluate the performance of prediction methods in three settings: short-term, long-term, and ultra-long-term prediction. Additionally, we introduce a novel baseline approach that leverages graph and temporal convolutional networks, which has demonstrated competitive results in multi-person motion prediction. We believe that the proposed MI-Motion benchmark dataset and baseline will facilitate future research in this area, ultimately leading to better understanding and modeling of multi-person interactions.
翻译:三维多人运动预测是一项涉及个体行为建模与人际交互分析的挑战性任务。尽管已有多种方法被提出,但由于缺乏标准化的训练设置和基准数据集,难以对这些方法进行公平比较。本文提出多人交互运动(MI-Motion)数据集,该数据集包含由动作捕捉系统采集并经过游戏引擎精炼合成的多人骨架序列,涵盖167k帧交互人群骨架姿势,并按5种不同活动场景进行分类。为促进多人运动预测研究,我们提供了三类预测场景(短时、长时和超长时预测)的评估基准。此外,我们提出了一种基于图卷积网络与时序卷积网络的新型基线方法,该方法在多人运动预测中展现出具有竞争力的性能。我们相信,所提出的MI-Motion基准数据集与基线方法将推动该领域未来研究,最终促进对多人交互行为的深入理解与建模。