Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.
翻译:风格引导的纹理生成旨在给定参考风格图像、三维网格及其文本描述的条件下,生成既与参考图像风格协调、又与输入网格几何结构适配的纹理。尽管基于扩散模型的三维纹理生成方法(如蒸馏采样)在风格化游戏和影视领域具有广泛的应用前景,但仍需解决两大挑战:1)从参考图像中为三维模型完全解耦风格与内容;2)使生成纹理在色调、风格上与参考图像对齐,同时匹配给定的文本提示。为此,我们提出StyleTex——一种基于扩散模型的创新框架,用于为三维模型创建风格化纹理。我们的核心思路是在基于扩散的蒸馏采样过程中,从参考图像中解耦风格信息并忽略其内容。具体而言,给定参考图像后,我们首先通过从图像CLIP嵌入中减去其在内容特征方向上的正交投影(该内容特征由文本CLIP嵌入表示),从而分解出图像的风格特征。这种解耦参考图像风格与内容信息的新方法使我们能够生成独立的风格和内容特征。随后,我们将风格特征注入交叉注意力机制以融入生成过程,同时利用内容特征作为负向提示以进一步分离内容信息。最终,我们将这些策略整合到StyleTex中,从而获得风格化纹理。StyleTex生成的纹理在保持参考图像风格的同时,还能与文本提示及给定三维网格的内在细节对齐。定量与定性实验表明,我们的方法显著优于现有基线方法。