We introduce SAGE; a Generative LLM for inferring attribute values for products across world-wide e-Commerce catalogs. We introduce a novel formulation of the attribute-value prediction problem as a Seq2Seq summarization task, across languages, product types and target attributes. Our novel modeling approach lifts the restriction of predicting attribute values within a pre-specified set of choices, as well as, the requirement that the sought attribute values need to be explicitly mentioned in the text. SAGE can infer attribute values even when such values are mentioned implicitly using periphrastic language, or not-at-all-as is the case for common-sense defaults. Additionally, SAGE is capable of predicting whether an attribute is inapplicable for the product at hand, or non-obtainable from the available information. SAGE is the first method able to tackle all aspects of the attribute-value-prediction task as they arise in practical settings in e-Commerce catalogs. A comprehensive set of experiments demonstrates the effectiveness of the proposed approach, as well as, its superiority against state-of-the-art competing alternatives. Moreover, our experiments highlight SAGE's ability to tackle the task of predicting attribute values in zero-shot setting; thereby, opening up opportunities for significantly reducing the overall number of labeled examples required for training.
翻译:我们提出SAGE——一种用于推断全球电子商务目录中产品属性值的大语言生成模型。本文提出将属性值预测问题创新性地表述为跨语言、跨产品类型及目标属性的序列到序列摘要任务。这种新颖建模方法突破了在预设选择集中预测属性值的限制,也无需在文本中显式提及待求属性值。即使属性值通过迂回表述隐含提及,或作为常识性默认值完全未提及,SAGE仍能推断出相应属性值。此外,SAGE还能判断某属性对当前产品是否适用,或是否无法从现有信息中获取。SAGE是首个能够处理电子商务目录实际场景中属性值预测任务所有方面的方法。全面的实验证明了所提方法的有效性及其相较于最先进竞品的优越性。同时,实验突出展示了SAGE在零样本设定下预测属性值的能力,这为显著降低训练所需标注样本总量开辟了新途径。