With the burgeoning development of fields such as the Metaverse, Virtual Reality (VR), and Digital Twins, text-to-3D generation has emerged as a research hotspot in both academia and industry. Currently, optimization methods based on Score Distillation Sampling (SDS) utilizing 2D diffusion priors have become the mainstream technological paradigm in this field. However, due to the view bias of 2D priors and the mode-seeking ambiguity combined with gradient noise induced by high Classifier-Free Guidance (CFG), these methods still suffer from macro-topological inconsistency (e.g., the Janus problem) and micro-geometric discontinuity. To address these challenges, we propose MOC-3D, a text-to-3D generation method based on geometric manifold and semantic view-order consistency. Built upon the ScaleDreamer framework, our method incorporates a Semantic View-Order Constraint Module and a Manifold-based Feature Continuity Module. The former aims to rectify macro-topological inconsistency, while the latter focuses on eliminating micro-geometric discontinuity. Specifically, the Semantic View-Order Constraint Module leverages the prior knowledge of CLIP to impose a Monotonicity Rank Constraint on semantic score representations across different views, thereby providing effective guidance for the global topological structure of 3D objects. Meanwhile, the Manifold-based Feature Continuity Module employs the Riemannian Metric on the Symmetric Positive Definite (SPD) manifold. By measuring the distance of feature statistical distributions in the Riemannian space, it promotes the smooth evolution and continuity of micro-textures across multi-views in a statistical sense. Under the macro-micro synergistic optimization of these two modules, our model can simultaneously improve macro-structural consistency and micro-detail continuity.
翻译:随着元宇宙、虚拟现实(VR)与数字孪生等领域的蓬勃发展,文本到三维生成已成为学术界与工业界的研究热点。当前,基于分数蒸馏采样(SDS)并利用二维扩散先验的优化方法已成为该领域的主流技术范式。然而,由于二维先验的视角偏差、模式搜索模糊性以及高无分类器引导(CFG)导致的梯度噪声,此类方法仍存在宏观拓扑不一致性(如"Janus问题")与微观几何不连续性。为解决上述问题,本文提出基于几何流形与语义视角序一致性的文本到三维生成方法MOC-3D。该方法在ScaleDreamer框架基础上,融合了语义视角序约束模块与流形特征连续性模块:前者旨在修正宏观拓扑不一致性,后者则聚焦消除微观几何不连续性。具体而言,语义视角序约束模块借助CLIP先验知识,对不同视图的语义得分表征施加单调性秩约束,从而为三维物体的全局拓扑结构提供有效引导;流形特征连续性模块则利用对称正定(SPD)流形上的黎曼度量,通过测量黎曼空间中特征统计分布的距离,从统计意义上促进多视图微观纹理的平滑演化与连续性。在两模块的宏微观协同优化下,本模型可同步提升宏观结构一致性与微观细节连续性。