Structured product data in the form of attribute/value pairs is the foundation of many e-commerce applications such as faceted product search, product comparison, and product recommendation. Product offers often only contain textual descriptions of the product attributes in the form of titles or free text. Hence, extracting attribute/value pairs from textual product descriptions is an essential enabler for e-commerce applications. In order to excel, state-of-the-art product information extraction methods require large quantities of task-specific training data. The methods also struggle with generalizing to out-of-distribution attributes and attribute values that were not a part of the training data. Due to being pre-trained on huge amounts of text as well as due to emergent effects resulting from the model size, Large Language Models like ChatGPT have the potential to address both of these shortcomings. This paper explores the potential of ChatGPT for extracting attribute/value pairs from product descriptions. We experiment with different zero-shot and few-shot prompt designs. Our results show that ChatGPT achieves a performance similar to a pre-trained language model but requires much smaller amounts of training data and computation for fine-tuning.
翻译:以属性/值对形式呈现的结构化产品数据是许多电子商务应用(如分面产品搜索、产品比较和产品推荐)的基础。产品信息通常仅包含以标题或自由文本形式呈现的产品属性文本描述。因此,从文本产品描述中提取属性/值对是电子商务应用的关键赋能技术。为达到卓越性能,当前最先进的产品信息提取方法需要大量任务专用训练数据,且难以泛化至训练数据未涵盖的分布外属性和属性值。由于在海量文本上预训练以及模型规模带来的涌现效应,ChatGPT等大型语言模型有望同时解决这两个缺陷。本文探索了利用ChatGPT从产品描述中提取属性/值对的潜力,实验了不同的零样本和少样本提示设计。结果表明,ChatGPT能达到与预训练语言模型相当的性能,但所需微调训练数据和计算量显著更少。