Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical knowledge. Specifically, we disentangle subject-specific, task-specific, and other auxiliary information from the observed pose sequences and utilise these factorised features to query the memory. A novel Multi-Head knowledge retrieval scheme leverages these factorised feature embeddings to perform multiple querying operations over the historical observations captured within the auxiliary memory. Moreover, our proposed dynamic masking strategy makes this feature disentanglement process dynamic. Two novel loss functions are introduced to encourage diversity within the auxiliary memory while ensuring the stability of the memory contents, such that it can locate and store salient information that can aid the long-term prediction of future motion, irrespective of data imbalances or the diversity of the input data distribution. With extensive experiments conducted on two public benchmarks, Human3.6M and CMU-Mocap, we demonstrate that these design choices collectively allow the proposed approach to outperform the current state-of-the-art methods by significant margins: $>$ 17\% on the Human3.6M dataset and $>$ 9\% on the CMU-Mocap dataset.
翻译:人类行为呈现出复杂的多样性,其动作模式因任务类型、交互形式以及个体偏好而异。因此,基于历史运动序列对未来姿态进行预测是一项具有挑战性的任务。本文提出一种创新的辅助记忆驱动型深度神经网络框架,旨在改进对历史知识的建模能力。具体而言,我们通过观测姿态序列解耦了个体特异性、任务特异性及其他辅助信息,并利用这些分解特征对记忆模块进行查询。一种新型的多头知识检索方案利用这些分解特征嵌入,对辅助记忆中捕获的历史观测数据进行多重查询操作。此外,我们提出的动态掩码策略使特征解耦过程具有动态性。引入两类新型损失函数,在保障辅助记忆内容稳定性的同时增强其多样性,使记忆模块能够定位并存储关键信息,从而支持未来运动的长时预测,而无需考虑数据不平衡或输入数据分布的多样性。通过在Human3.6M和CMU-Mocap两个公开基准数据集上的大量实验,我们证明了这些设计选择共同推动所提方法显著超越当前最先进方法:在Human3.6M数据集上提升>17%,在CMU-Mocap数据集上提升>9%。