We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality objects represented by 3D neural radiance fields. These methods handle multiple text queries separately, and therefore, the resulting objects have a high variability in object pose and structure. However, in some applications such as geometry editing, it is desirable to obtain aligned objects. In order to achieve alignment, we propose to optimize the continuous trajectories between the aligned objects, by modeling a space of linear pairwise interpolations of the textual embeddings with a single NeRF representation. We demonstrate that similar objects, consisting of semantically corresponding parts, can be well aligned in 3D space without costly modifications to the generation process. We provide several practical scenarios including mesh editing and object hybridization that benefit from geometry alignment and experimentally demonstrate the efficiency of our method. https://voyleg.github.io/a3d/
翻译:我们从几何对齐的角度解决文本驱动的三维生成问题。我们的目标是生成在语义和几何上保持一致的多对象。基于分数蒸馏的现有方法已成功将二维扩散模型的知识提炼到以三维神经辐射场表示的高质量对象中。这些方法分别处理多个文本查询,因此生成的对象在姿态和结构上具有高度可变性。然而,在某些应用(如几何编辑)中,获得对齐的对象是可取的。为实现对齐,我们提出通过单一NeRF表示对文本嵌入的线性成对插值空间进行建模,从而优化对齐对象间的连续轨迹。我们证明,由语义对应部分组成的相似对象可以在三维空间中良好对齐,而无需对生成过程进行代价高昂的修改。我们提供了包括网格编辑和对象混合在内的多个实际应用场景,这些场景受益于几何对齐,并通过实验验证了我们方法的有效性。https://voyleg.github.io/a3d/