Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we present a multi-view sketch-guided text-to-3D generation framework (namely, Sketch2NeRF) to add sketch control to 3D generation. Specifically, our method leverages pretrained 2D diffusion models (e.g., Stable Diffusion and ControlNet) to supervise the optimization of a 3D scene represented by a neural radiance field (NeRF). We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF. In the experiments, we collected two kinds of multi-view sketch datasets to evaluate the proposed method. We demonstrate that our method can synthesize 3D consistent contents with fine-grained sketch control while being high-fidelity to text prompts. Extensive results show that our method achieves state-of-the-art performance in terms of sketch similarity and text alignment.
翻译:近期,文本到三维方法已能通过文本描述实现高保真的三维内容生成。然而,生成的物体具有随机性且缺乏细粒度控制。草图提供了一种引入此类细粒度控制的廉价手段。但由于草图的抽象性和模糊性,实现对其灵活控制仍具挑战性。本文提出了一种多视角草图引导的文本到三维生成框架(即Sketch2NeRF),旨在向三维生成中添加草图控制。具体而言,我们的方法利用预训练的二维扩散模型(如Stable Diffusion和ControlNet)来监督基于神经辐射场(NeRF)表示的三维场景的优化过程。我们提出了一种新颖的同步生成与重建方法,以有效优化NeRF。实验中,我们收集了两种多视角草图数据集以评估所提方法。结果表明,我们的方法能够在保持对文本提示高保真度的同时,合成具有细粒度草图控制、且内容一致的三维场景。大量实验证明,本方法在草图相似度和文本对齐度方面均达到了最优性能。