Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed nature of this problem. Prior methods either require multi-view capture and expensive differentiable simulation or predict only garment geometry without the material properties required for realistic simulation. We propose a feed-forward framework that sidesteps these limitations by first fine-tuning a vision-language model to infer material composition and fabric attributes from real images, and then training a lightweight predictor that maps these attributes to the corresponding physical fabric parameters using a small dataset of material-physics measurements. Our approach introduces two new datasets (FTAG and T2P) and delivers simulation-ready garments from a single image without iterative optimization. Experiments show that our estimator achieves superior accuracy in material composition estimation and fabric attribute prediction, and by passing them through our physics parameter estimator, we further achieve higher-fidelity simulations compared to state-of-the-art image-to-garment methods.
翻译:从单张图像估计物理精确、可直接仿真的服装具有挑战性,这主要源于图像-物理数据集的缺失以及该问题本身的不适定性。现有方法要么需要多视角采集和昂贵的可微分仿真,要么仅能预测服装几何形状而无法提供实现逼真仿真所需的材料属性。我们提出一种前馈框架以规避这些限制:首先微调一个视觉语言模型,使其能够从真实图像推断材料成分与织物属性;随后利用一个小规模的材料-物理测量数据集,训练一个轻量级预测器将这些属性映射至对应的物理织物参数。本方法引入了两个新数据集(FTAG 与 T2P),无需迭代优化即可从单张图像生成可直接仿真的服装。实验表明,我们的估计器在材料成分估计与织物属性预测方面均取得了更优的精度,且通过将其输出传递至物理参数估计器,我们进一步实现了比当前最先进的图像到服装方法更高保真度的仿真效果。