With the rapid advancement of generative AI, virtual try-on (VTON) systems are becoming increasingly common in e-commerce and digital entertainment. However, the growing realism of AI-generated try-on content raises pressing concerns about authenticity and responsible use. To address this, we present VTONGuard, a large-scale benchmark dataset containing over 775,000 real and synthetic try-on images. The dataset covers diverse real-world conditions, including variations in pose, background, and garment styles, and provides both authentic and manipulated examples. Based on this benchmark, we conduct a systematic evaluation of multiple detection paradigms under unified training and testing protocols. Our results reveal each method's strengths and weaknesses and highlight the persistent challenge of cross-paradigm generalization. To further advance detection, we design a multi-task framework that integrates auxiliary segmentation to enhance boundary-aware feature learning, achieving the best overall performance on VTONGuard. We expect this benchmark to enable fair comparisons, facilitate the development of more robust detection models, and promote the safe and responsible deployment of VTON technologies in practice.
翻译:随着生成式人工智能的快速发展,虚拟试穿系统在电子商务和数字娱乐领域日益普及。然而,AI生成的试穿内容在真实感上的不断提升,引发了关于内容真实性与负责任使用的迫切担忧。为此,我们提出了VTONGuard——一个包含超过77.5万张真实与合成试穿图像的大规模基准数据集。该数据集涵盖了多样化的真实场景条件,包括姿态、背景和服装风格的变化,并同时提供了真实样本与经过篡改的样本。基于此基准,我们在统一的训练与测试协议下,对多种检测范式进行了系统性评估。我们的结果揭示了每种方法的优势与不足,并凸显了跨范式泛化这一持续存在的挑战。为了进一步推进检测技术,我们设计了一个多任务框架,该框架通过集成辅助分割任务来增强边界感知特征学习,从而在VTONGuard上实现了最佳的综合性能。我们期望该基准能够支持公平比较,促进开发更鲁棒的检测模型,并推动VTON技术在实践中安全、负责任地部署。