We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry and overlook the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, we propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. We first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then we synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, we obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. Our experiments demonstrate the effectiveness of our approach for decomposition, canonicalization, and composition tasks compared to existing solutions.
翻译:我们提出GALA框架,该框架以单层带衣物三维人体网格为输入,将其分解为完整的多层三维资产。这些输出可与其他资产组合,生成任意姿态的新型带衣物人体化身。现有重建方法常将带衣物人体视为单层几何结构,忽略了人体与发型、衣物及配饰固有的组合特性,从而限制了网格在后续应用中的实用性。将单层网格分解为独立分层是一项具有挑战性的任务,因为它需要为严重遮挡区域合成合理的几何与纹理。此外,即便成功实现分解,网格在姿态和体型方面也未标准化,导致无法与新的身份和姿态进行连贯组合。为解决这些挑战,我们提出利用预训练二维扩散模型作为人体及其他资产几何与外观先验的通用知识。首先,通过从多视角二维分割中提取的三维表面分割对输入网格进行分离;随后,利用新颖的引导姿态得分蒸馏采样损失,在姿态空间和标准空间中合成不同层缺失的几何结构。完成高保真三维几何修复后,我们对该几何的纹理也应用相同的SDS损失,以获取包括最初遮挡区域在内的完整外观。通过一系列分解步骤,我们在共享的标准空间中(姿态和人体形状已标准化)获得多个三维资产分层,从而支持轻松组合为新的身份,并以新姿态实现再动画。实验表明,与现有方案相比,本方法在分解、标准化和组合任务中具有显著有效性。