3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image. Compared with existing methods, we observe that three primary challenges remain: (1) 3D ground-truth meshes of clothing are usually inaccessible due to annotation difficulties and time costs; (2) Conventional template-based methods are limited to modeling non-rigid objects, e.g., handbags and dresses, which are common in fashion images; (3) The inherent ambiguity compromises the model training, such as the dilemma between a large shape with a remote camera or a small shape with a close camera. In an attempt to address the above limitations, we propose a causality-aware self-supervised learning method to adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations. In particular, to solve the inherent ambiguity among four implicit variables, i.e., camera position, shape, texture, and illumination, we introduce an explainable structural causal map (SCM) to build our model. The proposed model structure follows the spirit of the causal map, which explicitly considers the prior template in the camera estimation and shape prediction. When optimization, the causality intervention tool, i.e., two expectation-maximization loops, is deeply embedded in our algorithm to (1) disentangle four encoders and (2) facilitate the prior template. Extensive experiments on two 2D fashion benchmarks (ATR and Market-HQ) show that the proposed method could yield high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of the proposed method on a fine-grained bird dataset, i.e., CUB. The code is available at https://github.com/layumi/ 3D-Magic-Mirror .

翻译：本研究旨在探索一种自监督的三维服装重建方法，该方法能够从单张图像中恢复人体服装的几何形状与纹理。相较于现有方法，我们观察到三个主要挑战依然存在：(1) 由于标注困难和时间成本，服装的三维真实网格通常难以获取；(2) 传统基于模板的方法受限于对非刚性物体（例如时尚图像中常见的手提包和连衣裙）的建模；(3) 固有歧义性损害了模型训练，例如大形状配合远距离相机与小形状配合近距离相机之间的两难困境。为解决上述局限，我们提出了一种因果感知的自监督学习方法，能够在无三维标注的情况下从二维图像自适应地重建三维非刚性物体。具体而言，为求解四个隐含变量（即相机位置、形状、纹理及光照）之间的固有歧义性，我们引入了一个可解释的结构因果图(SCM)来构建模型。所提出的模型结构遵循因果图思想，在相机估计和形状预测中明确考虑了先验模板。在优化过程中，因果干预工具（即两个期望最大化循环）被深度嵌入算法中，以实现：(1) 解耦四个编码器；(2) 促进先验模板的利用。在两个二维时尚基准数据集（ATR和Market-HQ）上的大量实验表明，所提方法能够生成高保真的三维重建结果。此外，我们还在细粒度鸟类数据集CUB上验证了该方法可扩展性。代码已开源至https://github.com/layumi/3D-Magic-Mirror。