FreeMotion：基于多模态大语言模型的无动作捕捉人体运动合成 (FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models)

Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same time, foundation models trained with internet-scale image and text data have demonstrated surprising world knowledge and reasoning ability for various downstream tasks. Utilizing these foundation models may help with human motion synthesis, which some recent works have superficially explored. However, these methods didn't fully unveil the foundation models' potential for this task and only support several simple actions and environments. In this paper, we for the first time, without any motion data, explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs across any motion task and environment. Our framework can be split into two stages: 1) sequential keyframe generation by utilizing MLLMs as a keyframe designer and animator; 2) motion filling between keyframes through interpolation and motion tracking. Our method can achieve general human motion synthesis for many downstream tasks. The promising results demonstrate the worth of mocap-free human motion synthesis aided by MLLMs and pave the way for future research.

翻译：人体运动合成是计算机动画领域的一项基础任务。尽管近期基于深度学习和动作捕捉数据的研究取得了进展，但现有方法始终受限于特定的运动类别、环境和风格。这种泛化能力不足的问题部分可归因于大规模高质量运动数据采集的难度与成本。与此同时，基于互联网规模图像与文本数据训练的基础模型已在各类下游任务中展现出惊人的世界知识与推理能力。利用这些基础模型可能有助于人体运动合成，近期已有研究对此进行了初步探索。然而，这些方法未能充分揭示基础模型在此任务中的潜力，且仅支持若干简单动作与环境。本文首次在无需任何运动数据的情况下，基于多模态大语言模型，探索以自然语言指令作为用户控制信号的开放式人体运动合成，可适用于任意运动任务与环境。我们的框架可分为两个阶段：1）利用多模态大语言模型作为关键帧设计与动画生成器，实现序列化关键帧生成；2）通过插值与运动跟踪完成关键帧间的运动填充。本方法能够为众多下游任务实现通用的人体运动合成。实验结果表明基于多模态大语言模型的无动作捕捉运动合成具有重要价值，并为未来研究开辟了道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日