E-commerce platforms rely on structured product descriptions, in the form of attribute/value pairs to enable features such as faceted product search and product comparison. However, vendors on these platforms often provide unstructured product descriptions consisting of a title and a textual description. To process such offers, e-commerce platforms must extract attribute/value pairs from the unstructured descriptions. State-of-the-art attribute/value extraction methods based on pre-trained language models (PLMs), such as BERT, face two drawbacks (i) the methods require significant amounts of task-specific training data and (ii) the fine-tuned models have problems to generalize to attribute values that were not part of the training data. We explore the potential of using large language models (LLMs) as a more training data-efficient and more robust alternative to existing attribute/value extraction methods. We propose different prompt templates for instructing LLMs about the target schema of the extraction, covering both zero-shot and few-shot scenarios. In the zero-shot scenario, textual and JSON-based approaches for representing information about the target attributes are compared. In the scenario with training data, we investigate (i) the provision of example attribute values, (ii) the selection of in-context demonstrations, (iii) shuffled ensembling to prevent position bias, and (iv) fine-tuning the LLM. The prompt templates are evaluated in combination with hosted LLMs, such as GPT-3.5 and GPT-4, and open-source LLMs based on Llama2 which can be run locally. The best average F1-score of 86% was reached by GPT-4 using an ensemble of shuffled prompts that combine attribute names, attribute descriptions, example values, and demonstrations. Given the same amount of training data, this prompt/model combination outperforms the best PLM baseline by an average of 6% F1.
翻译:电子商务平台依赖结构化产品描述(以属性/值对的形式)来支持分面搜索和产品对比等功能。然而,平台上的商家常提供由标题和文本描述构成的非结构化产品描述。为处理此类信息,电商平台需从非结构化描述中提取属性/值对。基于预训练语言模型(如BERT)的最先进属性/值提取方法面临两个缺陷:(i)这些方法需要大量任务特定训练数据,(ii)微调后的模型难以泛化至训练数据中未出现的属性值。我们探索了使用大型语言模型作为现有属性/值提取方法的替代方案,其优势在于更高的训练数据效率和更强的鲁棒性。我们提出了不同的提示模板,用于指导大型语言模型理解提取目标模式,涵盖零样本和少样本场景。在零样本场景中,比较了基于文本和基于JSON的目标属性信息表示方法。在具有训练数据的场景中,我们研究了(i)提供示例属性值、(ii)上下文示例的选择、(iii)随机集成以防止位置偏差,以及(iv)对大型语言模型进行微调。这些提示模板与托管式大型语言模型(如GPT-3.5和GPT-4)以及基于Llama2的可本地运行开源大型语言模型结合评估。GPT-4使用结合属性名称、属性描述、示例值和上下文示例的随机提示集成达到了最佳平均F1分数86%。在同等训练数据量下,该提示/模型组合相比最佳PLM基线平均提高了6%的F1分数。