Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed nature of this problem. Prior methods either require multi-view capture and expensive differentiable simulation or predict only garment geometry without the material properties required for realistic simulation. We propose a feed-forward framework that sidesteps these limitations by first fine-tuning a vision-language model to infer material composition and fabric attributes from real images, and then training a lightweight predictor that maps these attributes to the corresponding physical fabric parameters using a small dataset of material-physics measurements. Our approach introduces two new datasets (FTAG and T2P) and delivers simulation-ready garments from a single image without iterative optimization. Experiments show that our estimator achieves superior accuracy in material composition estimation and fabric attribute prediction, and by passing them through our physics parameter estimator, we further achieve higher-fidelity simulations compared to state-of-the-art image-to-garment methods.
翻译:从单张图像中估计物理精确、仿真就绪的服装是一项具有挑战性的任务,这主要源于图像-物理数据集的缺失以及该问题本身的不适定性。现有方法要么需要多视角采集和昂贵的可微分仿真,要么仅能预测服装几何形态而缺乏实现真实仿真所需的材料属性。我们提出一种前馈式框架以规避这些限制:首先通过微调视觉-语言模型从真实图像中推断材料成分与织物属性,随后利用小规模材料-物理测量数据集,训练一个轻量级预测器将这些属性映射至对应的物理织物参数。本方法引入了两个新数据集(FTAG与T2P),无需迭代优化即可从单张图像生成仿真就绪的服装。实验表明,我们的估计器在材料成分估计与织物属性预测方面均达到更优精度,且通过将其输出传递至物理参数估计器,相比现有图像到服装生成方法,我们进一步实现了更高保真度的仿真效果。