In recent years, Graph Convolutional Networks (GCNs) have been widely used in human motion prediction, but their performance remains unsatisfactory. Recently, MLP-Mixer, initially developed for vision tasks, has been leveraged into human motion prediction as a promising alternative to GCNs, which achieves both better performance and better efficiency than GCNs. Unlike GCNs, which can explicitly capture human skeleton's bone-joint structure by representing it as a graph with edges and nodes, MLP-Mixer relies on fully connected layers and thus cannot explicitly model such graph-like structure of human's. To break this limitation of MLP-Mixer's, we propose \textit{Graph-Guided Mixer}, a novel approach that equips the original MLP-Mixer architecture with the capability to model graph structure. By incorporating graph guidance, our \textit{Graph-Guided Mixer} can effectively capture and utilize the specific connectivity patterns within human skeleton's graph representation. In this paper, first we uncover a theoretical connection between MLP-Mixer and GCN that is unexplored in existing research. Building on this theoretical connection, next we present our proposed \textit{Graph-Guided Mixer}, explaining how the original MLP-Mixer architecture is reinvented to incorporate guidance from graph structure. Then we conduct an extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets, which shows that our method achieves state-of-the-art performance.
翻译:近年来,图卷积网络(GCNs)已广泛应用于人体运动预测,但其性能仍不尽如人意。最近,最初为视觉任务开发的MLP-Mixer被引入人体运动预测领域,作为GCNs的一种有前景的替代方案,其性能与效率均优于GCNs。与GCNs通过将人体骨骼表示为包含节点和边的图结构从而显式捕捉其拓扑关系不同,MLP-Mixer依赖全连接层,因此无法显式建模人体骨骼的图结构。为突破MLP-Mixer的这一局限性,我们提出**图引导混合器(Graph-Guided Mixer)**,一种赋予原始MLP-Mixer架构建模图结构能力的新方法。通过引入图引导机制,我们的**图引导混合器**能够有效捕捉并利用人体骨骼图表示中的特定连接模式。本文首先揭示了现有研究中尚未探索的MLP-Mixer与GCN之间的理论关联;基于该理论联系,随后提出我们的**图引导混合器**,阐释原始MLP-Mixer架构如何被重新设计以融入图结构引导。最后,我们在Human3.6M、AMASS和3DPW数据集上进行全面评估,结果表明所提方法达到了最先进性能。