Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.
翻译:人体运动生成旨在生成自然的姿态序列序列,在现实应用中展现出巨大潜力。近年来,运动数据采集技术与生成方法取得了实质性进展,为人体运动生成领域的研究热潮奠定了基础。该领域大多数研究聚焦于基于条件信号(如文本、音频和场景上下文)生成人体运动。尽管近年来取得了显著进展,但由于人体运动的复杂性及其与条件信号之间的隐含关系,该任务仍面临诸多挑战。作为本领域首篇系统性综述,我们全面梳理了人体运动生成的相关文献。首先介绍人体运动与生成模型的背景知识,继而深入分析文本条件、音频条件和场景条件这三大主流子任务的代表性方法。此外,我们总结了常用数据集与评估指标。最后,探讨了当前存在的开放性问题,并展望了潜在的研究方向。期望本综述能为学界提供该快速发展领域的全景式认知,同时激发解决现有挑战的创新思路。