Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Effective query-item relevance modeling is pivotal for enhancing user experience and safeguarding user satisfaction in e-commerce search systems. Recently, benefiting from the vast inherent knowledge, Large Language Model (LLM) approach demonstrates strong performance and long-tail generalization ability compared with previous neural-based specialized relevance learning methods. Though promising, current LLM-based methods encounter the following inadequacies in practice: First, the massive parameters and computational demands make it difficult to be deployed online. Second, distilling LLM models to online models is a feasible direction, but the LLM relevance modeling is a black box, and its rich intrinsic knowledge is difficult to extract and apply online. To improve the interpretability of LLM and boost the performance of online relevance models via LLM, we propose an Explainable LLM-driven Multi-dimensional Distillation framework for e-commerce relevance learning, which comprises two core components: (1) An Explainable LLM for relevance modeling (ELLM-rele), which decomposes the relevance learning into intermediate steps and models relevance learning as a Chain-of-Thought (CoT) reasoning, thereby enhancing both interpretability and performance of LLM. (2) A Multi-dimensional Knowledge Distillation (MKD) architecture that transfers the knowledge of ELLM-rele to current deployable interaction-based and representation-based student models from both the relevance score distribution and CoT reasoning aspects. Through distilling the probabilistic and CoT reasoning knowledge, MKD improves both the semantic interaction and long-tail generalization abilities of student models. Extensive offline evaluations and online experiments on Taobao search ad scene demonstrate that our proposed framework significantly enhances e-commerce relevance learning performance and user experience.

翻译：有效的查询-商品相关性建模对于提升电商搜索系统的用户体验和保障用户满意度至关重要。近期，得益于其庞大的内在知识，大语言模型（LLM）方法相较于以往基于神经网络的专用相关性学习方法，展现出强大的性能和长尾泛化能力。尽管前景广阔，当前基于LLM的方法在实践中仍存在以下不足：首先，其海量参数和计算需求使得在线部署困难。其次，将LLM模型蒸馏至在线模型是一个可行的方向，但LLM的相关性建模是一个黑箱，其丰富的内在知识难以在线提取和应用。为了提高LLM的可解释性，并借助LLM提升在线相关性模型的性能，我们提出了一种用于电商相关性学习的可解释LLM驱动多维度蒸馏框架，该框架包含两个核心组件：（1）一个用于相关性建模的可解释LLM（ELLM-rele），它将相关性学习分解为中间步骤，并将相关性学习建模为思维链（CoT）推理，从而同时提升LLM的可解释性和性能。（2）一个多维度知识蒸馏（MKD）架构，它从相关性分数分布和CoT推理两个方面，将ELLM-rele的知识迁移到当前可部署的基于交互和基于表示的学生模型中。通过蒸馏概率性和CoT推理知识，MKD提升了学生模型的语义交互能力和长尾泛化能力。在淘宝搜索广告场景上进行的大量离线评估和在线实验表明，我们提出的框架显著提升了电商相关性学习性能和用户体验。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日