Denoising diffusion models have shown great promise in human motion synthesis conditioned on natural language descriptions. However, integrating spatial constraints, such as pre-defined motion trajectories and obstacles, remains a challenge despite being essential for bridging the gap between isolated human motion and its surrounding environment. To address this issue, we propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process. Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses. Together with a new imputation formulation, the generated motion can reliably conform to spatial constraints such as global motion trajectories. Furthermore, given sparse spatial constraints (e.g. sparse keyframes), we introduce a new dense guidance approach to turn a sparse signal, which is susceptible to being ignored during the reverse steps, into denser signals to guide the generated motion to the given constraints. Our extensive experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation while allowing control of the synthesized motions with spatial constraints.
翻译:去噪扩散模型在基于自然语言描述的人体运动合成中展现出巨大潜力。然而,整合空间约束(如预定义运动轨迹和障碍物)仍是一项挑战,尽管这对于弥合孤立人体运动与其周围环境之间的鸿沟至关重要。为解决这一问题,我们提出引导式运动扩散(GMD),该方法将空间约束融入运动生成过程。具体而言,我们提出一种有效的特征投影方案,通过操控运动表征来增强空间信息与局部姿态之间的一致性。结合一种新的插补公式,生成的运动能够可靠地符合全局运动轨迹等空间约束。此外,针对稀疏的空间约束(例如稀疏关键帧),我们引入一种新的密集引导方法,将易在逆向步骤中被忽略的稀疏信号转化为更密集的信号,从而引导生成的运动符合给定约束。大量实验验证了GMD的合理性,该方法在基于文本的运动生成中显著优于现有最先进技术,同时允许通过空间约束对合成运动进行控制。