Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality training data. While recent video generative models effortlessly produce realistic material appearances, this knowledge remains entangled with geometry and lighting. We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models. First, we finetune a large video model (Wan 2.1 14B) to generate material sample videos under controlled camera and lighting trajectories, effectively creating a "virtual gonioreflectometer" that preserves the model's material realism while learning a structured measurement pattern. Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone. From 17 generated video frames, our LRM performs single-pass inference to predict neural material parameters that generalize to novel viewing and lighting conditions. The resulting materials exhibit realism and diversity far exceeding the limited synthetic training data, demonstrating that material knowledge can be successfully transferred from internet-scale video models into standalone, reusable neural 3D assets.
翻译:为三维渲染创建逼真材质需要卓越的艺术技巧。针对材质的生成模型虽能提供帮助,但目前受限于高质量训练数据的匮乏。尽管近期视频生成模型能轻松生成逼真的材质外观,但此类知识仍与几何形状及光照特性相纠缠。我们提出VideoNeuMat——一种从视频扩散模型中提取可复用神经材质资产的两阶段管线。首先,我们对大型视频模型(Wan 2.1 14B)进行微调,使其在受控相机与光照轨迹下生成材质样本视频,有效构建"虚拟测角反射仪",在保留模型材质真实感的同时,学习结构化测量模式。其次,我们通过基于较小Wan 1.3B视频骨干微调的大型重建模型(LRM),从这些视频中重建紧凑型神经材质。该LRM仅需17个生成视频帧即可通过单次推理预测可泛化至新视角与光照条件的神经材质参数。所得材质展现出的真实感与多样性远超有限的合成训练数据,验证了材质知识可成功从互联网规模的视频模型迁移至独立、可复用的神经3D资产。