Controllable Motion Diffusion Model

Generating realistic and controllable motions for virtual characters is a challenging task in computer animation, and its implications extend to games, simulations, and virtual reality. Recent studies have drawn inspiration from the success of diffusion models in image generation, demonstrating the potential for addressing this task. However, the majority of these studies have been limited to offline applications that target at sequence-level generation that generates all steps simultaneously. To enable real-time motion synthesis with diffusion models in response to time-varying control signals, we propose the framework of the Controllable Motion Diffusion Model (COMODO). Our framework begins with an auto-regressive motion diffusion model (A-MDM), which generates motion sequences step by step. In this way, simply using the standard DDPM algorithm without any additional complexity, our framework is able to generate high-fidelity motion sequences over extended periods with different types of control signals. Then, we propose our reinforcement learning-based controller and controlling strategies on top of the A-MDM model, so that our framework can steer the motion synthesis process across multiple tasks, including target reaching, joystick-based control, goal-oriented control, and trajectory following. The proposed framework enables the real-time generation of diverse motions that react adaptively to user commands on-the-fly, thereby enhancing the overall user experience. Besides, it is compatible with the inpainting-based editing methods and can predict much more diverse motions without additional fine-tuning of the basic motion generation models. We conduct comprehensive experiments to evaluate the effectiveness of our framework in performing various tasks and compare its performance against state-of-the-art methods.

翻译：为虚拟角色生成逼真且可控的运动是计算机动画中的一项挑战性任务，其应用延伸至游戏、仿真和虚拟现实领域。受扩散模型在图像生成领域成功经验的启发，近期研究已展现出解决该任务的潜力。然而，这些研究大多局限于针对序列级生成的离线应用（即同时生成所有步骤）。为利用扩散模型实现响应时变控制信号的实时运动合成，我们提出可控运动扩散模型（COMODO）框架。该框架以自回归运动扩散模型（A-MDM）为基础，可逐步生成运动序列。通过这种方式，仅使用标准DDPM算法而无需引入额外复杂性，框架即能在长时间跨度内生成高保真运动序列，并支持不同类型的控制信号。随后，我们在A-MDM模型之上提出基于强化学习的控制器与控制策略，使框架能够引导运动合成过程完成多项任务，包括目标到达、摇杆控制、导向控制及轨迹跟踪。所提框架支持实时生成能够动态适应用户指令的多样化运动，从而提升整体用户体验。此外，该框架与基于修补（inpainting）的编辑方法兼容，可在无需对基础运动生成模型进行额外微调的情况下预测出更具多样性的运动。我们通过全面实验评估框架在各项任务中的有效性，并将其性能与现有最优方法进行比较。