In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns motion embeddings from a large, unstructured dataset spanning tens of hours of motion examples. The resultant motion representation not only captures diverse motion skills but also offers a robust and intuitive interface for various applications. We demonstrate the versatility of MoConVQ through several applications: universal tracking control from various motion sources, interactive character control with latent motion representations using supervised learning, physics-based motion generation from natural language descriptions using the GPT framework, and, most interestingly, seamless integration with large language models (LLMs) with in-context learning to tackle complex and abstract tasks.
翻译:本文提出MoConVQ,一种利用可扩展离散表示实现物理驱动运动控制的新型统一框架。该方法基于向量量化变分自编码器(VQ-VAE)和基于模型的强化学习,能够从包含数十小时运动示例的大型非结构化数据集中有效学习运动嵌入。所得运动表示不仅捕获了多样的运动技能,还为各种应用提供了鲁棒且直观的交互接口。我们通过若干应用展示了MoConVQ的多功能性:基于多源运动输入的通用跟踪控制、利用监督学习实现隐式运动表示的交互式角色控制、基于GPT框架从自然语言描述生成物理驱动运动,以及最引人注目的是,结合上下文学习与大型语言模型(LLM)的无缝集成,以处理复杂抽象的交互任务。