Creating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.
翻译:创建超高分辨率空间变化双向反射分布函数(SVBRDF)对于逼真的三维内容创作至关重要,它能忠实呈现近距渲染所需的精细表面细节。然而,实现4K分辨率生成面临两大关键挑战:(1)需要以全分辨率合成多个反射率图,这会成倍增加像素处理量并带来极高的内存与计算成本;(2)必须在4K分辨率下保持各反射率图间强像素级对齐,这在适配为RGB图像领域设计的预训练模型时尤为困难。我们提出HiMat——一个专为高效且多样化4K SVBRDF生成而设计的扩散式框架。针对第一个挑战,HiMat通过DC-AE在高压缩潜空间中进行生成,并采用具备线性注意力机制的预训练扩散Transformer以提升单图生成效率。针对第二个挑战,我们提出CrossStitch——一种轻量级卷积模块,可在无需全局注意力计算开销的前提下强制实现跨图一致性。实验表明,相较于现有方法,HiMat能以更优的效率、结构一致性和多样性实现高保真4K SVBRDF生成。除材质生成外,本框架还可推广至本征分解等相关应用领域。