Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
翻译:先前的材质生成方法在产生多样化结果方面存在局限,主要因为基于重建的方法依赖于真实世界测量,而基于生成的方法则在相对较小的材质数据集上进行训练。为应对这些挑战,我们提出了DreamPBR——一种新颖的基于扩散的生成框架,旨在通过文本和多模态控制引导创建空间变化的表面外观属性,为材质生成提供高度可控性和多样性。实现多样化且高质量PBR材质生成的关键在于:整合近期基于数十亿文本-图像对训练的大规模视觉-语言模型的能力,并结合从数百个PBR材质样本中提取的材质先验知识。我们采用新颖的材质潜在扩散模型(LDM)建立反照率图与对应潜在空间之间的映射关系,随后通过具备渲染感知能力的PBR解码器将潜在表示解码为完整的SVBRDF参数图。我们的方法支持通过循环填充卷积实现可平铺生成。此外,我们引入了包含像素对齐引导、风格图像引导和三维形状引导的多模态引导模块,以增强材质LDM的控制能力。我们通过大量可控生成与编辑应用案例,证明了DreamPBR在材质创作中的有效性,并展示了其多功能性和用户友好性。