Multi-label classification models have a wide range of applications in E-commerce, including visual-based label predictions and language-based sentiment classifications. A major challenge in achieving satisfactory performance for these tasks in the real world is the notable imbalance in data distribution. For instance, in fashion attribute detection, there may be only six 'puff sleeve' clothes among 1000 products in most E-commerce fashion catalogs. To address this issue, we explore more data-efficient model training techniques rather than acquiring a huge amount of annotations to collect sufficient samples, which is neither economic nor scalable. In this paper, we propose a state-of-the-art weighted objective function to boost the performance of deep neural networks (DNNs) for multi-label classification with long-tailed data distribution. Our experiments involve image-based attribute classification of fashion apparels, and the results demonstrate favorable performance for the new weighting method compared to non-weighted and inverse-frequency-based weighting mechanisms. We further evaluate the robustness of the new weighting mechanism using two popular fashion attribute types in today's fashion industry: sleevetype and archetype.
翻译:多标签分类模型在电子商务领域具有广泛应用,包括基于视觉的标签预测和基于语言的情感分类。在实际应用中,实现这些任务的理想性能面临的主要挑战是数据分布的显著不均衡。例如,在时尚属性检测中,大多数电商时尚目录中1000件商品可能仅有6件"泡泡袖"服饰。为解决该问题,我们探索更高效的数据训练技术,而非通过收集海量标注来获取足够样本——这种方式既不经济也难以扩展。本文提出一种创新的加权目标函数,用于提升深度神经网络在长尾数据分布下的多标签分类性能。实验基于时尚服饰的图像属性分类,结果表明相较于无权重及基于逆频的加权机制,新加权方法具有更优性能。我们进一步选用当代时尚产业中两种常见时尚属性类型(袖型和版型)验证了新加权机制的鲁棒性。