Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...
翻译:查询与商品之间的文本相关性或文本匹配是电子商务搜索系统中的关键技术,旨在确保展示的商品能够匹配查询的意图。许多研究致力于提升搜索系统中相关性模型的性能。近年来,诸如BERT等预训练语言模型在文本相关性任务上取得了优异的性能。尽管这些模型在离线测试数据集上表现良好,但由于其高延迟,将预训练语言模型部署到在线系统仍存在障碍。双塔模型因其能够平衡性能与计算效率,在工业场景中被广泛采用。遗憾的是,此类模型呈现出不透明的“黑盒”特性,阻碍了开发者进行特殊优化。本文提出了深度词袋(DeepBoW)模型,一种高效且可解释的中文电子商务相关性架构。我们的方法旨在将查询和商品编码为稀疏的词袋表示,即一组词-权重对。权重表示对应词语与原始文本之间的重要性或相关性得分。相关性得分通过查询与商品的稀疏词袋表示之间匹配词语的累积来衡量。与通常具有黑盒缺陷的流行稠密分布式表示相比,所提出的表示模型的最大优势在于高度可解释和可干预,这对于在线搜索引擎的部署和运营是一个显著优势。此外,所提出模型的在线效率甚至优于最有效的稠密表示内积形式……