NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects

Deep generative models have been recently extended to synthesizing 3D digital humans. However, previous approaches treat clothed humans as a single chunk of geometry without considering the compositionality of clothing and accessories. As a result, individual items cannot be naturally composed into novel identities, leading to limited expressiveness and controllability of generative 3D avatars. While several methods attempt to address this by leveraging synthetic data, the interaction between humans and objects is not authentic due to the domain gap, and manual asset creation is difficult to scale for a wide variety of objects. In this work, we present a novel framework for learning a compositional generative model of humans and objects (backpacks, coats, scarves, and more) from real-world 3D scans. Our compositional model is interaction-aware, meaning the spatial relationship between humans and objects, and the mutual shape change by physical contact is fully incorporated. The key challenge is that, since humans and objects are in contact, their 3D scans are merged into a single piece. To decompose them without manual annotations, we propose to leverage two sets of 3D scans of a single person with and without objects. Our approach learns to decompose objects and naturally compose them back into a generative human model in an unsupervised manner. Despite our simple setup requiring only the capture of a single subject with objects, our experiments demonstrate the strong generalization of our model by enabling the natural composition of objects to diverse identities in various poses and the composition of multiple objects, which is unseen in training data. https://taeksuu.github.io/ncho/

翻译：深度生成模型最近已被扩展至三维数字人的合成领域。然而，现有方法通常将穿着衣物的人体视为单一几何块，未能考虑衣物与配饰的组合性。因此，单个物品无法自然组合成新的身份特征，导致生成式三维虚拟化身的表现力与可控性受限。尽管部分方法尝试利用合成数据解决该问题，但由于领域差异，人类与物体之间的交互并不真实，且人工资产创建难以针对多种物体规模化。本文提出了一种新框架，旨在从真实世界三维扫描中学习人类与物体（背包、外套、围巾等）的组合生成模型。我们的组合模型具有交互感知特性，即人体与物体间的空间关系以及物理接触导致的相互形变被完整纳入。关键挑战在于，由于人体与物体相互接触，其三维扫描结果融合为单一几何体。为在无人工标注条件下实现分解，我们创新性地利用同一人物在有/无物体状态下的两组三维扫描数据。该方法以无监督方式学习分解物体，并将其自然组合回生成式人体模型中。尽管我们的简单设置仅需采集单个受试者携带物体的扫描数据，实验证明该模型具有强大泛化能力——不仅能将物体自然组合至不同姿态的多类身份主体，还可实现训练数据中未出现的多物体组合。https://taeksuu.github.io/ncho/