Recent advances in garment pattern generation have shown promising progress. However, existing feed-forward methods struggle with diverse poses and viewpoints, while optimization-based approaches are computationally expensive and difficult to scale. This paper focuses on sewing pattern generation for garment modeling and fabrication applications that demand editable, separable, and simulation-ready garments. We propose DressWild, a novel feed-forward pipeline that reconstructs physics-consistent 2D sewing patterns and the corresponding 3D garments from a single in-the-wild image. Given an input image, our method leverages vision-language models (VLMs) to normalize pose variations at the image level, then extract pose-aware, 3D-informed garment features. These features are fused through a transformer-based encoder and subsequently used to predict sewing pattern parameters, which can be directly applied to physical simulation, texture synthesis, and multi-layer virtual try-on. Extensive experiments demonstrate that our approach robustly recovers diverse sewing patterns and the corresponding 3D garments from in-the-wild images without requiring multi-view inputs or iterative optimization, offering an efficient and scalable solution for realistic garment simulation and animation.
翻译:近年来,服装图案生成领域取得了显著进展。然而,现有前馈方法难以处理多样化的姿态和视角,而基于优化的方法则计算成本高昂且难以扩展。本文聚焦于服装建模与制作应用中需要可编辑、可分离且适用于仿真的缝纫图案生成。我们提出DressWild,一种新颖的前馈式流程,能够从单张野外图像重建物理一致的二维缝纫图案及对应的三维服装。给定输入图像,本方法利用视觉-语言模型在图像层面归一化姿态变化,进而提取姿态感知且具有三维信息的服装特征。这些特征通过基于Transformer的编码器进行融合,随后用于预测缝纫图案参数。所得参数可直接应用于物理仿真、纹理合成及多层虚拟试穿。大量实验表明,我们的方法无需多视角输入或迭代优化,即可从野外图像中稳健地重建多样化缝纫图案及对应三维服装,为真实感服装仿真与动画提供了高效可扩展的解决方案。