Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed nature of this problem. Prior methods either require multi-view capture and expensive differentiable simulation or predict only garment geometry without the material properties required for realistic simulation. We propose a feed-forward framework that sidesteps these limitations by first fine-tuning a vision-language model to infer material composition and fabric attributes from real images, and then training a lightweight predictor that maps these attributes to the corresponding physical fabric parameters using a small dataset of material-physics measurements. Our approach introduces two new datasets (FTAG and T2P) and delivers simulation-ready garments from a single image without iterative optimization. Experiments show that our estimator achieves superior accuracy in material composition estimation and fabric attribute prediction, and by passing them through our physics parameter estimator, we further achieve higher-fidelity simulations compared to state-of-the-art image-to-garment methods.
翻译:从单张图像估计物理精确、仿真就绪的服装是一项具有挑战性的任务,这主要源于图像到物理数据集的缺失以及该问题本身的不适定性。现有方法要么需要多视角捕获和昂贵的可微分仿真,要么仅能预测服装几何形状而无法提供实现逼真仿真所需的材料属性。我们提出了一种前馈框架,通过以下方式规避这些限制:首先微调一个视觉-语言模型,使其能够从真实图像中推断材料成分和织物属性;随后利用一个包含材料-物理测量的小型数据集,训练一个轻量级预测器,将这些属性映射到相应的物理织物参数。我们的方法引入了两个新数据集(FTAG 和 T2P),并实现了无需迭代优化即可从单张图像生成仿真就绪的服装。实验表明,我们的估计器在材料成分估计和织物属性预测方面均取得了更优的准确性,并且通过将其输出传递至我们的物理参数估计器,我们进一步实现了比当前最先进的图像到服装方法更高保真度的仿真效果。