In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns motion embeddings from a large, unstructured dataset spanning tens of hours of motion examples. The resultant motion representation not only captures diverse motion skills but also offers a robust and intuitive interface for various applications. We demonstrate the versatility of MoConVQ through several applications: universal tracking control from various motion sources, interactive character control with latent motion representations using supervised learning, physics-based motion generation from natural language descriptions using the GPT framework, and, most interestingly, seamless integration with large language models (LLMs) with in-context learning to tackle complex and abstract tasks.
翻译:本文提出MoConVQ,一种利用可扩展离散表征的统一物理运动控制新框架。基于向量量化变分自编码器(VQ-VAE)与基于模型的强化学习,我们的方法能够从包含数十小时运动示例的大型非结构化数据集中高效学习运动嵌入。所得运动表征不仅捕捉了多样的运动技能,还为多种应用提供了鲁棒且直观的接口。我们通过若干应用展示了MoConVQ的多功能性:来自多种运动源的通用跟踪控制、使用监督学习的潜运动表征交互式角色控制、基于GPT框架从自然语言描述生成物理运动,以及最引人注目的是,通过上下文学习与大语言模型(LLMs)无缝集成,以应对复杂抽象任务。