Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.
翻译:利用外观对象的不同表示(如图像或文本)将外观迁移到三维资产上,因其在游戏、增强现实和数字内容创作等行业的广泛应用而备受关注。然而,当输入对象与外观对象之间的几何结构存在显著差异时,现有先进方法仍会失效。一种直接方法是直接应用三维生成模型,但我们证明这最终无法产生令人满意的结果。为此,我们提出一种受通用引导启发的原理性方法。给定一个以图像或文本为条件的预训练整流流模型,我们提出的免训练方法通过周期性地添加引导来干预采样过程。该引导可建模为可微损失函数,我们实验了两种不同类型的引导,包括针对外观的部件感知损失和自相似性损失。实验表明,我们的方法成功地将纹理和几何细节迁移到输入三维资产上,在定性和定量评估中均优于基线方法。我们还指出,由于缺乏真实数据时传统指标无法聚焦局部细节且难以比较差异较大的输入,它们并不适用于评估此任务。因此,我们采用基于GPT的系统对输出进行客观排序来评估外观迁移质量,确保评估的鲁棒性和拟人性,这一点在我们的用户研究中得到了进一步验证。除展示的场景外,我们的方法具有通用性,可扩展至不同类型的扩散模型和引导函数。