LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Product attribute value extraction is a pivotal component in Natural Language Processing (NLP) and the contemporary e-commerce industry. The provision of precise product attribute values is fundamental in ensuring high-quality recommendations and enhancing customer satisfaction. The recently emerging Large Language Models (LLMs) have demonstrated state-of-the-art performance in numerous attribute extraction tasks, without the need for domain-specific training data. Nevertheless, varying strengths and weaknesses are exhibited by different LLMs due to the diversity in data, architectures, and hyperparameters. This variation makes them complementary to each other, with no single LLM dominating all others. Considering the diverse strengths and weaknesses of LLMs, it becomes necessary to develop an ensemble method that leverages their complementary potentials. In this paper, we propose a novel algorithm called LLM-ensemble to ensemble different LLMs' outputs for attribute value extraction. We iteratively learn the weights for different LLMs to aggregate the labels with weights to predict the final attribute value. Not only can our proposed method be proven theoretically optimal, but it also ensures efficient computation, fast convergence, and safe deployment. We have also conducted extensive experiments with various state-of-the-art LLMs, including Llama2-13B, Llama2-70B, PaLM-2, GPT-3.5, and GPT-4, on Walmart's internal data. Our offline metrics demonstrate that the LLM-ensemble method outperforms all the state-of-the-art single LLMs on Walmart's internal dataset. This method has been launched in several production models, leading to improved Gross Merchandise Volume (GMV), Click-Through Rate (CTR), Conversion Rate (CVR), and Add-to-Cart Rate (ATC).

翻译：产品属性值提取是自然语言处理（NLP）及现代电子商务行业中的关键组成部分。提供精确的产品属性值是确保高质量推荐并提升客户满意度的基础。近期涌现的大语言模型（LLMs）已在众多属性提取任务中展现出最先进的性能，且无需领域特定的训练数据。然而，由于数据、架构和超参数的多样性，不同LLMs表现出不同的优势与劣势。这种差异使它们相互补充，没有任何单一LLM能在所有场景中占据主导地位。考虑到LLMs多样化的优缺点，有必要开发一种能够利用其互补潜力的集成方法。本文提出一种名为LLM-ensemble的新型算法，用于集成不同LLMs的输出结果以进行属性值提取。我们通过迭代学习不同LLMs的权重，聚合带权标签以预测最终属性值。所提方法不仅在理论上可证明最优，同时能够确保高效计算、快速收敛及安全部署。我们还在沃尔玛内部数据上，利用包括Llama2-13B、Llama2-70B、PaLM-2、GPT-3.5及GPT-4在内的多种最先进LLMs进行了广泛实验。离线指标表明，LLM-ensemble方法在沃尔玛内部数据集上超越了所有最先进的单一LLMs。该方法已部署至多个生产模型，显著提升了商品交易总额（GMV）、点击率（CTR）、转化率（CVR）及加入购物车率（ATC）。