Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the main causes of the view inconsistency problem -- the embedded bias of 2D diffusion models. Based on these findings, we propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation. Our first approach, called score debiasing, involves cutting off the score estimated by 2D diffusion models and gradually increasing the truncation value throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts using a language model, and adjusts the discrepancy between view prompts and the viewing direction of an object. Our experimental results show that our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead. Our project page is available at~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}.
翻译:现有基于分数蒸馏的文本到三维生成技术虽然前景广阔,但常面临视图不一致问题。其中最显著的是"杰纳斯问题"(Janus problem),即物体最典型视角(如人脸或头部)出现在其他视图中。本文深入探讨现有分数蒸馏文本到三维生成框架,并确定视图不一致问题的主要成因——二维扩散模型的内在偏差。基于这些发现,我们提出两种去偏分数蒸馏框架的方法以实现视图一致的文本到三维生成。第一种方法称为分数去偏,通过截断二维扩散模型估计的分数,并在优化过程中逐步增加截断值。第二种方法称为提示去偏,利用语言模型识别用户提示与视图提示之间的冲突词汇,并调整视图提示与物体观察方向之间的差异。实验结果表明,我们的方法通过显著减少伪影提升了生成三维物体的真实感,在保持对二维扩散模型的忠实度与三维一致性之间实现了良好平衡,且计算开销极小。项目页面详见~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}。