Text-to-3D generation based on diffusion models often suffers from the Janus problem, leading to inconsistent geometry across viewpoints. This work identifies viewpoint bias in 2D diffusion priors as the main cause and proposes Structural Energy-Guided Sampling (SEGS), a training-free and plug-and-play framework to improve multi-view consistency. SEGS constructs a structural energy in the PCA subspace of U-Net features and injects its gradient into the denoising process. It can be easily integrated into SDS/VSD pipelines without retraining. Experiments show that SEGS reduces the Janus Rate by about 10% on average and improves View-CS scores across multiple baselines, including DreamFusion, Magic3D, and LucidDreamer. This method effectively alleviates viewpoint artifacts while preserving appearance fidelity, providing a flexible solution for high-quality text-to-3D content generation.
翻译:基于扩散模型的文本到3D生成常常面临"雅努斯问题",导致跨视角几何不一致。本研究指出二维扩散先验中的视角偏差是主要原因,并提出无训练即插即用的结构能量引导采样框架SEGS,以提升多视角一致性。该框架在U-Net特征的PCA子空间中构建结构能量,并将该能量的梯度注入去噪过程,可轻松集成至SDS/VSD流水线而无需重新训练。实验表明,SEGS在多个基线方法(包括DreamFusion、Magic3D和LucidDreamer)上平均降低雅努斯率约10%,并提升View-CS分数。该方法在保持外观保真度的同时有效缓解视角伪影,为高质量文本到3D内容生成提供了灵活解决方案。