We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material properties (base color, roughness, metallicity, height map) to form physically plausible materials. We further introduce a custom variational auto-encoder which encodes multiple material modalities into a compact latent space, which enables joint generation of multiple modalities without increasing the number of tokens. Our pipeline generates high-quality materials for 3D shapes given a text prompt, compatible with common content creation tools.
翻译:本文提出了一种基于视频扩散Transformer架构的、为三维形状生成物理材质的方法。我们的方法以输入几何形状和文本描述为条件,联合建模多种材质属性(基础颜色、粗糙度、金属度、高度图)以形成物理合理的材质。我们进一步引入了一种定制的变分自编码器,该编码器将多种材质模态编码到一个紧凑的潜在空间中,从而能够在无需增加令牌数量的情况下联合生成多种模态。我们的流程能够根据文本提示为三维形状生成高质量材质,并与常见的内容创作工具兼容。