Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...

翻译：查询与商品之间的文本相关性或文本匹配是电子商务搜索系统中的关键技术，旨在确保展示的商品能够匹配查询的意图。许多研究致力于提升搜索系统中相关性模型的性能。近年来，诸如BERT等预训练语言模型在文本相关性任务上取得了优异的性能。尽管这些模型在离线测试数据集上表现良好，但由于其高延迟，将预训练语言模型部署到在线系统仍存在障碍。双塔模型因其能够平衡性能与计算效率，在工业场景中被广泛采用。遗憾的是，此类模型呈现出不透明的“黑盒”特性，阻碍了开发者进行特殊优化。本文提出了深度词袋（DeepBoW）模型，一种高效且可解释的中文电子商务相关性架构。我们的方法旨在将查询和商品编码为稀疏的词袋表示，即一组词-权重对。权重表示对应词语与原始文本之间的重要性或相关性得分。相关性得分通过查询与商品的稀疏词袋表示之间匹配词语的累积来衡量。与通常具有黑盒缺陷的流行稠密分布式表示相比，所提出的表示模型的最大优势在于高度可解释和可干预，这对于在线搜索引擎的部署和运营是一个显著优势。此外，所提出模型的在线效率甚至优于最有效的稠密表示内积形式……

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日