EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Diffusion models, particularly latent diffusion models, have demonstrated remarkable success in text-driven human motion generation. However, it remains challenging for latent diffusion models to effectively compose multiple semantic concepts into a single, coherent motion sequence. To address this issue, we propose EnergyMoGen, which includes two spectrums of Energy-Based Models: (1) We interpret the diffusion model as a latent-aware energy-based model that generates motions by composing a set of diffusion models in latent space; (2) We introduce a semantic-aware energy model based on cross-attention, which enables semantic composition and adaptive gradient descent for text embeddings. To overcome the challenges of semantic inconsistency and motion distortion across these two spectrums, we introduce Synergistic Energy Fusion. This design allows the motion latent diffusion model to synthesize high-quality, complex motions by combining multiple energy terms corresponding to textual descriptions. Experiments show that our approach outperforms existing state-of-the-art models on various motion generation tasks, including text-to-motion generation, compositional motion generation, and multi-concept motion generation. Additionally, we demonstrate that our method can be used to extend motion datasets and improve the text-to-motion task.

翻译：扩散模型，尤其是潜在扩散模型，在文本驱动的人体运动生成方面已展现出显著成功。然而，对于潜在扩散模型而言，如何将多个语义概念有效地组合成单一、连贯的运动序列仍然具有挑战性。为解决此问题，我们提出了EnergyMoGen，它包含两个谱系的基于能量的模型：（1）我们将扩散模型解释为一种潜在感知的能量模型，通过在潜在空间中组合一组扩散模型来生成运动；（2）我们引入了一种基于交叉注意力的语义感知能量模型，该模型能够实现语义组合和文本嵌入的自适应梯度下降。为克服这两个谱系中存在的语义不一致和运动失真挑战，我们引入了协同能量融合。该设计使得运动潜在扩散模型能够通过组合对应于文本描述的多个能量项，合成高质量、复杂的运动。实验表明，我们的方法在多种运动生成任务上优于现有的最先进模型，包括文本到运动生成、组合式运动生成以及多概念运动生成。此外，我们证明了我们的方法可用于扩展运动数据集并改进文本到运动任务。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/