Stochastic human motion prediction (HMP) has generally been tackled with generative adversarial networks and variational autoencoders. Most prior works aim at predicting highly diverse movements in terms of the skeleton joints' dispersion. This has led to methods predicting fast and motion-divergent movements, which are often unrealistic and incoherent with past motion. Such methods also neglect contexts that need to anticipate diverse low-range behaviors, or actions, with subtle joint displacements. To address these issues, we present BeLFusion, a model that, for the first time, leverages latent diffusion models in HMP to sample from a latent space where behavior is disentangled from pose and motion. As a result, diversity is encouraged from a behavioral perspective. Thanks to our behavior coupler's ability to transfer sampled behavior to ongoing motion, BeLFusion's predictions display a variety of behaviors that are significantly more realistic than the state of the art. To support it, we introduce two metrics, the Area of the Cumulative Motion Distribution, and the Average Pairwise Distance Error, which are correlated to our definition of realism according to a qualitative study with 126 participants. Finally, we prove BeLFusion's generalization power in a new cross-dataset scenario for stochastic HMP.
翻译:随机人体运动预测通常采用生成对抗网络和变分自编码器来解决。多数现有方法旨在根据骨骼关节分散度预测高度多样化的运动,这导致了预测快速且运动发散的方法,但此类预测往往不真实且与过去的运动不连贯。这些方法还忽略了需要预测具有细微关节位移的多样化低范围行为或动作的上下文。为解决这些问题,我们提出了BeLFusion——该模型首次在人体运动预测中利用潜在扩散模型,从行为与姿态和运动解耦的潜在空间中采样。因此,多样性可以从行为角度得到增强。得益于我们的行为耦合器能够将采样的行为传递至持续运动,BeLFusion的预测展现出显著优于现有技术的多样化且更真实的行为。为支持这一结论,我们引入了两个指标——累积运动分布面积和平均成对距离误差,它们与我们对真实性的定义相关(基于包含126名参与者的定性研究)。最后,我们在新的跨数据集随机人体运动预测场景中证明了BeLFusion的泛化能力。