Recent advancements in text-to-3D generation have significantly contributed to the automation and democratization of 3D content creation. Building upon these developments, we aim to address the limitations of current methods in generating 3D models with creative geometry and styles. We introduce multi-view ControlNet, a novel depth-aware multi-view diffusion model trained on generated datasets from a carefully curated text corpus. Our multi-view ControlNet is then integrated into our two-stage pipeline, ControlDreamer, enabling text-guided generation of stylized 3D models. Additionally, we present a comprehensive benchmark for 3D style editing, encompassing a broad range of subjects, including objects, animals, and characters, to further facilitate research on diverse 3D generation. Our comparative analysis reveals that this new pipeline outperforms existing text-to-3D methods as evidenced by human evaluations and CLIP score metrics.
翻译:近期文本到3D生成技术的进步显著推动了3D内容创作的自动化与民主化进程。基于这些进展,我们旨在解决当前方法在生成具有创意几何结构及风格的3D模型方面的局限性。我们引入了多视图ControlNet——一种基于深度感知的多视图扩散模型,该模型在从精心筛选的文本语料库生成的训练数据集上完成训练。随后,该多视图ControlNet被整合至我们的两阶段流水线ControlDreamer中,从而支持通过文本引导生成风格化的3D模型。此外,我们提出了一套全面的3D风格编辑基准测试,涵盖物体、动物及角色等广泛主题,以进一步促进多样化3D生成领域的研究。对比分析表明,这一新流水线在人类评估与CLIP评分指标上均优于现有文本到3D方法。