Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.
翻译:近期,以自然图像为输入的图像到三维方法取得了显著成果。然而实际应用中往往难以获取这类富含色彩的输入样本,通常仅能获得草图。现有草图到三维研究因缺乏色彩信息与多视角内容而面临广泛应用局限。针对此问题,本文提出新型生成范式Sketch3D,可生成与输入草图形状一致、且与文本描述色彩匹配的真实感三维资产。具体而言,Sketch3D首先通过形状保持生成过程,将给定草图实例化为参考图像;其次利用参考图像推导粗粒度三维高斯先验,并基于三维高斯模型的渲染结果生成多视角风格一致性引导图像;最后设计三种优化策略:通过分布迁移机制进行结构优化、采用标准MSE损失进行色彩优化、利用基于CLIP的几何相似性损失进行草图相似性优化。大量视觉对比与定量分析表明,Sketch3D在保持与输入一致性的同时,生成真实感三维资产方面具有显著优势。