Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/.
翻译:分数蒸馏采样(SDS)作为一种将预训练二维扩散模型的分数信息蒸馏至三维表征的方法,近期在文本到三维生成任务中取得了显著进展。然而,该方法仍面临严重的几何不一致性问题,例如Janus问题(多面人脸问题)。基于“此类不一致性可能源于不同视角下预测的二维分数之间存在多视图不一致性”的假设,我们提出了GSD——一个简单通用的即插即用框架,旨在将三维一致性及几何感知能力融入SDS流程。我们的方法包含三个核心组件:三维一致性噪声生成(用于产生完全符合标准高斯分布的三维一致性噪声图)、基于几何的梯度扭曲(用于识别不同视角预测梯度间的对应关系),以及新颖的梯度一致性损失函数(通过优化场景几何结构以生成更一致的梯度)。实验表明,该方法在文本到三维生成任务中显著提升了性能,以极低计算成本有效解决了几何不一致性问题,且与现有基于分数蒸馏的模型完全兼容。项目页面详见 https://ku-cvlab.github.io/GSD/。