Purposer: Putting Human Motion Generation in Context

We present a novel method to generate human motion to populate 3D indoor scenes. It can be controlled with various combinations of conditioning signals such as a path in a scene, target poses, past motions, and scenes represented as 3D point clouds. State-of-the-art methods are either models specialized to one single setting, require vast amounts of high-quality and diverse training data, or are unconditional models that do not integrate scene or other contextual information. As a consequence, they have limited applicability and rely on costly training data. To address these limitations, we propose a new method ,dubbed Purposer, based on neural discrete representation learning. Our model is capable of exploiting, in a flexible manner, different types of information already present in open access large-scale datasets such as AMASS. First, we encode unconditional human motion into a discrete latent space. Second, an autoregressive generative model, conditioned with key contextual information, either with prompting or additive tokens, and trained for next-step prediction in this space, synthesizes sequences of latent indices. We further design a novel conditioning block to handle future conditioning information in such a causal model by using a network with two branches to compute separate stacks of features. In this manner, Purposer can generate realistic motion sequences in diverse test scenes. Through exhaustive evaluation, we demonstrate that our multi-contextual solution outperforms existing specialized approaches for specific contextual information, both in terms of quality and diversity. Our model is trained with short sequences, but a byproduct of being able to use various conditioning signals is that at test time different combinations can be used to chain short sequences together and generate long motions within a context scene.

翻译：摘要：我们提出了一种新颖的方法，用于在三维室内场景中生成人体运动。该方法可通过多种条件信号的组合进行控制，例如场景中的路径、目标姿态、过往运动以及以三维点云形式呈现的场景。现有最先进方法要么是针对单一场景特化的模型，要么需要大量高质量且多样化的训练数据，要么是无法整合场景或其他上下文信息的无条件模型。因此，这些方法应用范围有限，且依赖昂贵的训练数据。为克服这些限制，我们提出了一种基于神经离散表征学习的新方法，命名为Purposer。我们的模型能够灵活利用来自开放获取大规模数据集（如AMASS）中已有的不同类型信息。首先，我们将无条件人体运动编码到离散潜在空间中。其次，一个以关键上下文信息（通过提示或附加令牌）为条件的自回归生成模型，在该空间中进行下一步预测训练，从而合成潜在索引序列。我们进一步设计了一种新颖的条件处理模块，通过采用双分支网络计算独立的特征堆栈，使得此类因果模型能够处理未来条件信息。通过这种方式，Purposer能够在多样化的测试场景中生成逼真的运动序列。通过详尽的评估，我们证明在特定上下文信息方面，我们的多情境解决方案在质量和多样性上均优于现有的专门方法。我们的模型使用短序列进行训练，但由于能够利用多种条件信号，在测试时可以组合不同信号将短序列串联起来，在情境场景中生成较长的运动。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日