Fine-grained multi-label classification models have broad applications in Amazon production features, such as visual based label predictions ranging from fashion attribute detection to brand recognition. One challenge to achieve satisfactory performance for those classification tasks in real world is the wild visual background signal that contains irrelevant pixels which confuses model to focus onto the region of interest and make prediction upon the specific region. In this paper, we introduce a generic semantic-embedding deep neural network to apply the spatial awareness semantic feature incorporating a channel-wise attention based model to leverage the localization guidance to boost model performance for multi-label prediction. We observed an Avg.relative improvement of 15.27% in terms of AUC score across all labels compared to the baseline approach. Core experiment and ablation studies involve multi-label fashion attribute classification performed on Instagram fashion apparels' image. We compared the model performances among our approach, baseline approach, and 3 alternative approaches to leverage semantic features. Results show favorable performance for our approach.
翻译:细粒度多标签分类模型在亚马逊产品特征中具有广泛应用,例如基于视觉的标签预测,涵盖从时尚属性检测到品牌识别等任务。在真实场景中实现这些分类任务满意性能的挑战之一,是存在包含无关像素的杂乱视觉背景信号,这会干扰模型聚焦感兴趣区域并针对特定区域进行预测。本文提出一种通用语义嵌入深度神经网络,通过引入空间感知语义特征并结合通道注意力模型,利用定位引导机制提升多标签预测性能。观察到所有标签的AUC分数相比基线方法平均相对提升15.27%。核心实验与消融研究基于Instagram时尚服装图像的多标签时尚属性分类任务展开。我们比较了本方法、基线方法及三种替代语义特征方法之间的模型性能,结果表明本方法具有更优表现。