Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials which could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space-optimization methods, and carefully validate our diffusion process design choices. Supplemental materials and additional details are available at: https://gvecchio.com/controlmat/.
翻译:从单张照片中重建材质是三维内容创作民主化的关键环节。我们将这一病态问题重新表述为受控合成任务,借助生成式深度网络的最新进展。我们提出ControlMat方法——以单张不可控光照照片为输入,通过条件化扩散模型生成合理、可平铺、高分辨率的物理基数字材质。我们系统分析了扩散模型在多通道输出中的行为特性,调整采样流程以融合多尺度信息,并引入滚动扩散技术实现可平铺性,同时通过分块扩散实现高分辨率输出。该生成式方法还能探索与输入图像匹配的多种材质变体,有效缓解未知光照条件带来的歧义性。实验表明,我们的方法优于近期推断与隐空间优化方法,并通过严谨实验验证了扩散模型设计选择的有效性。补充材料与更多细节请参见:https://gvecchio.com/controlmat/。