Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, an illumination-aware model for automatic 3D object editing according to the text prompts. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopt the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting.
翻译:纹理编辑是3D建模中的关键任务,它允许用户自动操控3D模型的表面材质。然而,3D模型固有的复杂性与文本描述的歧义性导致该任务面临挑战。为解决这一问题,我们提出ITEM3D——一种根据文本提示自动编辑3D对象的照明感知模型。借助扩散模型与可微渲染技术,ITEM3D将渲染图像作为文本与3D表征之间的桥梁,并进一步优化解耦的纹理与环境贴图。先前的方法采用绝对编辑方向(即分数蒸馏采样)作为优化目标,但这会导致噪声外观与文本不一致性问题。针对文本歧义带来的问题,我们引入相对编辑方向——一种由源文本与目标文本间噪声差异定义的优化目标——以释除文本与图像间的语义歧义。此外,我们通过逐步调整优化方向来进一步解决纹理域中的意外偏差。定性与定量实验表明,我们的ITEM3D在多种3D对象上均优于现有最优方法。我们还实现了文本引导的重光照,以展示对光照的显式控制能力。