Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Mengting Chen,Zhengrui Chen,Yongchao Du,Zuan Gao,Taihang Hu,Jinsong Lan,Chao Lin,Yefeng Shen,Xingjian Wang,Zhao Wang,Zhengtao Wu,Xiaoli Xu,Zhengze Xu,Hao Yan,Mingzhou Zhang,Jun Zheng,Qinye Zhou,Xiaoyong Zhu,Bo Zheng

from arxiv, 24 pages, model evaluation report

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our system maintains a high success rate across challenging cases like extreme poses, severe illumination variations, motion blur, and other in-the-wild conditions. Second, it delivers highly photorealistic results with fine-grained details, faithfully preserving garment texture, material properties, and structural characteristics, while largely avoiding common AI-generated artifacts. Third, beyond apparel try-on, our model supports flexible multi-image composition (up to 6 reference images) across 8 fashion categories, with coordinated control over person identity and background. Fourth, to overcome the latency bottlenecks of commercial deployment, our system is heavily optimized for inference speed, delivering near real-time generation for a seamless user experience. These capabilities are enabled by an integrated system design spanning end-to-end model architecture, a scalable data engine, robust infrastructure, and a multi-stage training paradigm. Extensive evaluation and large-scale product deployment demonstrate that Tstars-Tryon1.0 achieves leading overall performance. To support future research, we also release a comprehensive benchmark. The model has been deployed at an industrial scale on the Taobao App, serving millions of users with tens of millions of requests.

翻译：近年来，图像生成与编辑技术的进展为虚拟试穿开辟了新的可能性。然而，现有方法仍难以满足复杂的现实需求。我们提出Tstars-Tryon 1.0，一个具备鲁棒性、真实感、通用性及高效性的商业级虚拟试穿系统。首先，该系统在极端姿态、剧烈光照变化、运动模糊等复杂野外场景下均保持高成功率。其次，其生成结果高度逼真且细节丰富，能够忠实保留服装纹理、材质属性及结构特征，同时基本避免常见的人工智能生成伪影。第三，除服装试穿外，本模型支持跨8个时尚品类的灵活多图像合成（最多6张参考图像），并可对人身份与背景实现协调控制。第四，为克服商业部署的时延瓶颈，我们针对推理速度进行了深度优化，实现近乎实时的生成，以提供无缝用户体验。这些能力源于涵盖端到端模型架构、可扩展数据引擎、稳健基础设施及多阶段训练范式的集成系统设计。大量评估与规模化产品部署表明，Tstars-Tryon 1.0实现了业界领先的综合性能。为支持未来研究，我们还发布了综合性基准测试。该模型已在淘宝APP上实现工业级部署，服务数百万用户并处理数千万次请求。