Accurate product information is critical for e-commerce stores to allow customers to browse, filter, and search for products. Product data quality is affected by missing or incorrect information resulting in poor customer experience. While machine learning can be used to correct inaccurate or missing information, achieving high performance on fashion image classification tasks requires large amounts of annotated data, but it is expensive to generate due to labeling costs. One solution can be to generate synthetic data which requires no manual labeling. However, training a model with a dataset of solely synthetic images can lead to poor generalization when performing inference on real-world data because of the domain shift. We introduce a new unsupervised domain adaptation technique that converts images from the synthetic domain into the real-world domain. Our approach combines a generative neural network and a classifier that are jointly trained to produce realistic images while preserving the synthetic label information. We found that using real-world pseudo-labels during training helps the classifier to generalize in the real-world domain, reducing the synthetic bias. We successfully train a visual pattern classification model in the fashion domain without real-world annotations. Experiments show that our method outperforms other unsupervised domain adaptation algorithms.
翻译:精确的商品信息对于电商平台至关重要,它能使客户高效地浏览、筛选和搜索产品。产品数据质量会因信息缺失或错误而受损,导致客户体验下降。虽然机器学习可用于纠正错误或缺失信息,但要在时尚图像分类任务中实现高性能,需要大量标注数据,而标注成本导致数据生成代价高昂。一种解决方案是生成无需人工标注的合成数据。然而,仅使用合成图像数据集训练模型时,由于域偏移,对真实世界数据进行推理会导致泛化能力差。我们提出了一种新型无监督域适应技术,可将合成域图像转换为真实世界域图像。该方法联合训练生成式神经网络与分类器,在保留合成标签信息的同时生成逼真图像。研究发现,训练过程中使用真实世界伪标签有助于分类器在真实域中泛化,减少合成偏差。我们成功在无真实世界标注的情况下,训练了一个时尚领域的视觉图案分类模型。实验表明,我们的方法优于其他无监督域适应算法。