Product attribute value extraction plays an important role for many real-world applications in e-Commerce such as product search and recommendation. Previous methods treat it as a sequence labeling task that needs more annotation for position of values in the product text. This limits their application to real-world scenario in which only attribute values are weakly-annotated for each product without their position. Moreover, these methods only use product text (i.e., product title and description) and do not consider the semantic connection between the multiple attribute values of a given product and its text, which can help attribute value extraction. In this paper, we reformulate this task as a multi-label classification task that can be applied for real-world scenario in which only annotation of attribute values is available to train models (i.e., annotation of positional information of attribute values is not available). We propose a classification model with semantic matching and negative label sampling for attribute value extraction. Semantic matching aims to capture semantic interactions between attribute values of a given product and its text. Negative label sampling aims to enhance the model's ability of distinguishing similar values belonging to the same attribute. Experimental results on three subsets of a large real-world e-Commerce dataset demonstrate the effectiveness and superiority of our proposed model.
翻译:产品属性值提取在电商领域的许多实际应用(如商品搜索与推荐)中扮演着重要角色。现有方法通常将其视为序列标注任务,需要为产品文本中属性值的位置提供更多标注,这限制了其在仅有弱标注属性值(无位置信息)的真实场景中的应用。此外,这些方法仅依赖产品文本(如标题和描述),未考虑产品多属性值与其文本间的语义关联——而这种关联有助于属性值提取。本文将任务重新定义为多标签分类问题,适用于仅需属性值标注(无需位置信息标注)即可训练模型的真实场景。我们提出一种融合语义匹配与负标签采样的分类模型用于属性值提取:语义匹配旨在捕捉产品属性值与其文本间的语义交互,负标签采样则增强模型对同一属性下相似值的区分能力。在大型电商真实数据集三个子集上的实验结果表明,所提模型具有有效性与优越性。