Creating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.
翻译:生成超高分辨率空间变化双向反射分布函数(SVBRDF)对于创建逼真的3D内容至关重要,它能够忠实地呈现特写渲染所需的精细表面细节。然而,实现4K生成面临两个关键挑战:(1)需要以全分辨率合成多幅反射率贴图,这成倍增加了像素预算,带来了高昂的内存和计算成本;(2)要求保持4K分辨率下各贴图间像素级的强对齐能力,这在适配专为RGB图像域设计的预训练模型时尤为困难。为此,我们提出HiMat——一种专为高效且多样化的4K SVBRDF生成设计的扩散模型框架。针对第一个挑战,HiMat通过DC-AE在高压缩潜空间中执行生成,并采用配备线性注意力机制的预训练扩散变压器以提高逐图生成效率。针对第二个挑战,我们提出CrossStitch——一种轻量级卷积模块,它能在不引入全局注意力开销的前提下强化跨图一致性。实验表明,与现有方法相比,HiMat实现了高保真度的4K SVBRDF生成,在效率、结构一致性和多样性方面均表现更优。除材质生成外,本框架还可泛化至固有成分分解等相关应用场景。