Estimating the material property field of 3D assets is critical for physics-based simulation, robotics, and digital twin generation. Existing vision-based approaches are either too expensive and slow or rely on 3D information. We present SLAT-Phys, an end-to-end method that predicts spatially varying material property fields of 3D assets directly from a single RGB image without explicit 3D reconstruction. Our approach leverages spatially organised latent features from a pretrained 3D asset generation model that encodes rich geometry and semantic prior, and trains a lightweight neural decoder to estimate Young's modulus, density, and Poisson's ratio. The coarse volumetric layout and semantic cues of the latent representation about object geometry and appearance enable accurate material estimation. Our experiments demonstrate that our method provides competitive accuracy in predicting continuous material parameters when compared against prior approaches, while significantly reducing computation time. In particular, SLAT-Phys requires only 9.9 seconds per object on an NVIDIA RTXA5000 GPU and avoids reconstruction and voxelization preprocessing. This results in 120x speedup compared to prior methods and enables faster material property estimation from a single image.
翻译:三维资产的材料属性场估计对于基于物理的仿真、机器人技术及数字孪生构建至关重要。现有基于视觉的方法要么成本高昂且速度缓慢,要么依赖三维信息。我们提出SLAT-Phys,一种端到端方法,可直接从单张RGB图像预测三维资产的空间变化材料属性场,无需显式三维重建。该方法利用预训练三维资产生成模型中的结构化潜特征(编码了丰富的几何与语义先验),并训练轻量级神经解码器以估计杨氏模量、密度和泊松比。潜表示中关于物体几何与外观的粗粒度体素布局及语义线索,使得材料估计具有高精度。实验表明,与先前方法相比,我们的方法在预测连续材料参数时具有竞争性精度,同时显著缩短计算时间。具体而言,SLAT-Phys在NVIDIA RTXA5000 GPU上每物体仅需9.9秒,且无需重建和体素化预处理。相比先前方法实现120倍加速,从而支持从单张图像更快地完成材料属性估计。