3D shape modeling is labor-intensive and time-consuming and requires years of expertise. Recently, 2D sketches and text inputs were considered as conditional modalities to 3D shape generation networks to facilitate 3D shape modeling. However, text does not contain enough fine-grained information and is more suitable to describe a category or appearance rather than geometry, while 2D sketches are ambiguous, and depicting complex 3D shapes in 2D again requires extensive practice. Instead, we explore virtual reality sketches that are drawn directly in 3D. We assume that the sketches are created by novices, without any art training, and aim to reconstruct physically-plausible 3D shapes. Since such sketches are potentially ambiguous, we tackle the problem of the generation of multiple 3D shapes that follow the input sketch structure. Limited in the size of the training data, we carefully design our method, training the model step-by-step and leveraging multi-modal 3D shape representation. To guarantee the plausibility of generated 3D shapes we leverage the normalizing flow that models the distribution of the latent space of 3D shapes. To encourage the fidelity of the generated 3D models to an input sketch, we propose a dedicated loss that we deploy at different stages of the training process. We plan to make our code publicly available.
翻译:三维形状建模是一项劳动密集且耗时的任务,需要多年专业经验。近年来,二维草图和文本输入被视为三维形状生成网络的条件模态,以简化三维形状建模过程。然而,文本缺乏足够的细粒度信息,更适合描述类别或外观而非几何结构;二维草图则存在歧义性,且通过二维方式描绘复杂三维形状同样需要大量练习。为此,我们探索直接在三维空间中绘制的虚拟现实草图。假设这些草图由未经艺术训练的初学者绘制,旨在重建物理上合理的三维形状。由于此类草图可能具有歧义性,我们致力于解决生成多个遵循输入草图结构的三维形状问题。受限于训练数据规模,我们精心设计方法,通过分阶段训练模型并利用多模态三维形状表示。为保证生成三维形状的物理合理性,我们利用标准化流建模三维形状潜在空间的分布。为增强生成三维模型对输入草图的保真度,我们提出专用损失函数,并在训练过程的不同阶段部署。我们计划公开代码。