Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for individual tasks. To address this, we propose a simple yet effective transformer-based model for human motion prediction. The model employs a stack of self-attention modules to effectively capture both spatial dependencies within a pose and temporal relationships across a motion sequence. This simple, streamlined, end-to-end model is sufficiently versatile to handle pose-only, trajectory-only, and combined prediction tasks without task-specific modifications. We demonstrate that this approach achieves state-of-the-art results across all tasks through extensive experiments on a wide range of benchmark datasets, including Human3.6M, AMASS, ETH-UCY, and 3DPW.
翻译:人体运动预测结合了轨迹预测与人体姿态预测两大任务。针对这两个任务,已开发出专门的模型。将这些模型整合以实现整体人体运动预测并非易事,且现有方法在各自任务的既定基准测试中表现欠佳。为解决这一问题,我们提出了一种简单而有效的基于Transformer的人体运动预测模型。该模型采用堆叠的自注意力模块,以有效捕捉姿态内的空间依赖关系以及运动序列中的时间关联。这种简洁、流线型的端到端模型具备足够的通用性,无需针对特定任务进行修改,即可处理纯姿态、纯轨迹以及联合预测任务。通过在Human3.6M、AMASS、ETH-UCY和3DPW等广泛基准数据集上进行大量实验,我们证明该方法在所有任务上均取得了最先进的结果。