Fine-grained multi-label classification models have broad applications in e-commerce, such as visual based label predictions ranging from fashion attribute detection to brand recognition. One challenge to achieve satisfactory performance for those classification tasks in real world is the wild visual background signal that contains irrelevant pixels which confuses model to focus onto the region of interest and make prediction upon the specific region. In this paper, we introduce a generic semantic-embedding deep neural network to apply the spatial awareness semantic feature incorporating a channel-wise attention based model to leverage the localization guidance to boost model performance for multi-label prediction. We observed an Avg.relative improvement of 15.27% in terms of AUC score across all labels compared to the baseline approach. Core experiment and ablation studies involve multi-label fashion attribute classification performed on Instagram fashion apparels' image. We compared the model performances among our approach, baseline approach, and 3 alternative approaches to leverage semantic features. Results show favorable performance for our approach.
翻译:细粒度多标签分类模型在电子商务领域具有广泛应用,例如基于视觉的标签预测(从时尚属性检测到品牌识别)。现实世界中,此类分类任务面临的关键挑战在于视觉背景信号包含无关像素的干扰,导致模型难以聚焦兴趣区域并基于特定区域进行预测。本文提出一种通用语义嵌入深度神经网络,通过引入空间感知语义特征,结合基于通道注意力的模型,利用定位引导机制提升多标签预测性能。实验表明,与基线方法相比,所有标签的AUC分数平均相对提升15.27%。核心实验与消融研究基于Instagram时尚服饰图像的多标签属性分类任务展开。我们将本文方法与基线方法及其他三种利用语义特征的替代方法进行了模型性能对比,结果证实了本文方法的优越性。