The view inconsistency problem in score-distilling text-to-3D generation, also known as the Janus problem, arises from the intrinsic bias of 2D diffusion models, which leads to the unrealistic generation of 3D objects. In this work, we explore score-distilling text-to-3D generation and identify the main causes of the Janus problem. Based on these findings, we propose two approaches to debias the score-distillation frameworks for robust text-to-3D generation. Our first approach, called score debiasing, involves gradually increasing the truncation value for the score estimated by 2D diffusion models throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts utilizing a language model and adjusts the discrepancy between view prompts and object-space camera poses. Our experimental results show that our methods improve realism by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead.
翻译:评分蒸馏式文本到3D生成中的视角不一致问题(即Janus问题)源于2D扩散模型的内在偏差,导致3D对象生成不真实。本文探索了评分蒸馏式文本到3D生成过程,并确定了Janus问题的主要成因。基于这些发现,我们提出了两种针对评分蒸馏框架的去偏方法以实现鲁棒文本到3D生成。第一种方法名为评分去偏,通过在整个优化过程中逐步增大对2D扩散模型评分估计的截断值来实施。第二种方法名为提示去偏,利用语言模型识别用户提示与视角提示间的冲突词语,并调整视角提示与对象空间相机位姿之间的差异。实验结果表明,我们的方法通过显著减少伪影提升了生成真实性,并在保持对2D扩散模型忠实度与三维一致性之间取得了良好平衡,且计算开销极低。