Fine-grained multi-label classification models have broad applications in Amazon production features, such as visual based label predictions ranging from fashion attribute detection to brand recognition. One challenge to achieve satisfactory performance for those classification tasks in real world is the wild visual background signal that contains irrelevant pixels which confuses model to focus onto the region of interest and make prediction upon the specific region. In this paper, we introduce a generic semantic- embedding deep neural network to apply the spatial awareness semantic feature incorporating a channel- wise attention based model to leverage the localization guidance to boost model performance for multi- label prediction. We observed an Avg.relative improvement of 15.27% in terms of AUC score across all labels compared to the baseline approach. Core experiment and ablation studies involve multi-label fashion attribute classification performed on Instagram fashion apparels' image. We compared the model performances among our approach, baseline approach, and 3 alternative approaches to leverage semantic features. Results show favorable performance for our approach.
翻译:细粒度多标签分类模型在亚马逊产品特征中具有广泛应用,例如从时尚属性检测到品牌识别的基于视觉的标签预测。在实际应用中,这些分类任务面临的一个挑战是包含无关像素的野外观测背景信号,这会干扰模型聚焦于感兴趣区域并基于特定区域进行预测。本文提出了一种通用的语义嵌入深度神经网络,通过引入空间感知语义特征并集成基于通道注意力的模型,利用定位指导来提升多标签预测性能。与基线方法相比,我们在所有标签上的AUC分数平均提升了15.27%。核心实验和消融研究基于Instagram时尚服装图像进行多标签时尚属性分类。我们比较了本方法、基线方法及三种利用语义特征的替代方法的模型性能,结果表明本方法效果更优。