Stochastic human motion prediction (HMP) has generally been tackled with generative adversarial networks and variational autoencoders. Most prior works aim at predicting highly diverse movements in terms of the skeleton joints' dispersion. This has led to methods predicting fast and motion-divergent movements, which are often unrealistic and incoherent with past motion. Such methods also neglect contexts that need to anticipate diverse low-range behaviors, or actions, with subtle joint displacements. To address these issues, we present BeLFusion, a model that, for the first time, leverages latent diffusion models in HMP to sample from a latent space where behavior is disentangled from pose and motion. As a result, diversity is encouraged from a behavioral perspective. Thanks to our behavior coupler's ability to transfer sampled behavior to ongoing motion, BeLFusion's predictions display a variety of behaviors that are significantly more realistic than the state of the art. To support it, we introduce two metrics, the Area of the Cumulative Motion Distribution, and the Average Pairwise Distance Error, which are correlated to our definition of realism according to a qualitative study with 126 participants. Finally, we prove BeLFusion's generalization power in a new cross-dataset scenario for stochastic HMP.
翻译:随机人体运动预测(HMP)通常通过生成对抗网络和变分自编码器来实现。大多数先前工作旨在预测骨骼关节分散程度较高的多样化运动,这导致方法倾向于预测快速且运动发散的动作,这类动作往往不真实且与过往运动不连贯。此类方法也忽视了需要预测细微关节位移的多样化低范围行为或动作的上下文场景。为解决这些问题,我们提出BeLFusion模型——首次在HMP中利用潜扩散模型,从行为与姿态及运动解耦的潜在空间进行采样。由此,从行为视角激发运动多样性。得益于我们的行为耦合器能够将采样行为迁移至持续运动过程,BeLFusion的预测结果展现出多种行为,且显著优于现有技术的真实度。为验证该结论,我们引入两个指标——累积运动分布面积与平均成对距离误差,这两个指标与我们对真实性的定义相关(基于包含126名参与者的定性研究)。最后,我们在针对随机HMP的新跨数据集场景中证明了BeLFusion的泛化能力。