Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.
翻译:产品属性值识别(PAVI)涉及从产品资料中识别属性值,这是提升电子商务平台产品搜索、推荐和商业分析的关键任务。然而,现有的PAVI方法面临关键挑战,例如推断隐式值、处理分布外(OOD)值以及生成规范化输出。为应对这些局限,我们提出了分类感知对比学习检索(TACLR),这是首个基于检索的PAVI方法。TACLR通过将产品资料和候选值编码为嵌入向量,并根据其与商品嵌入的相似性检索值,从而将PAVI构建为一个信息检索任务。它利用具有分类感知的困难负样本进行对比训练,并采用动态阈值的自适应推理。TACLR具有三个关键优势:(1)它能有效处理隐式值和OOD值,同时生成规范化输出;(2)可扩展至数千个类别、数万个属性和数百万个值;(3)支持高负载工业场景的高效推理。在专有和公共数据集上的大量实验验证了TACLR的有效性和效率。此外,该方法已成功部署于一个真实世界的电子商务平台,每日处理数百万产品列表,同时支持动态的大规模属性分类体系。