Identifying attribute values from product profiles is a key task for improving product search, recommendation, and business analytics on e-commerce platforms, which we called Product Attribute Value Identification (PAVI) . However, existing PAVI methods face critical challenges, such as cascading errors, inability to handle out-of-distribution (OOD) attribute values, and lack of generalization capability. To address these limitations, we introduce Multi-Value-Product Retrieval-Augmented Generation (MVP-RAG), combining the strengths of retrieval, generation, and classification paradigms. MVP-RAG defines PAVI as a retrieval-generation task, where the product title description serves as the query, and products and attribute values act as the corpus. It first retrieves similar products of the same category and candidate attribute values, and then generates the standardized attribute values. The key advantages of this work are: (1) the proposal of a multi-level retrieval scheme, with products and attribute values as distinct hierarchical levels in PAVI domain (2) attribute value generation of large language model to significantly alleviate the OOD problem and (3) its successful deployment in a real-world industrial environment. Extensive experimental results demonstrate that MVP-RAG performs better than the state-of-the-art baselines.
翻译:从产品档案中识别属性值是提升电子商务平台产品搜索、推荐与商业分析的关键任务,我们称之为产品属性值识别。然而,现有的PAVI方法面临关键挑战,例如级联错误、无法处理分布外属性值以及泛化能力不足。为应对这些局限,我们提出了多值-产品检索增强生成,它结合了检索、生成与分类范式的优势。MVP-RAG将PAVI定义为一项检索-生成任务,其中产品标题描述作为查询,产品与属性值构成语料库。该方法首先检索同类别的相似产品及候选属性值,随后生成标准化的属性值。本工作的关键优势在于:提出了以产品和属性值作为PAVI领域中不同层次的多级检索方案;利用大语言模型生成属性值以显著缓解OOD问题;以及在实际工业环境中的成功部署。大量实验结果表明,MVP-RAG的性能优于现有最先进的基线方法。