Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encoder have their limitations in accuracy or inference speed respectively. In this work, we propose a novel model called the Entity-Based Relevance Model (EBRM). We identify the entities contained in an item and decompose the QI (query-item) relevance problem into multiple QE (query-entity) relevance problems; we then aggregate their results to form the QI prediction using a soft logic formulation. The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy as well as cache QE predictions for fast online inference. Utilizing soft logic makes the prediction procedure interpretable and intervenable. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance. The proposed method is evaluated on labeled data from e-commerce websites. Empirical results show that it achieves promising improvements with computation efficiency.
翻译:从海量商品库中识别用户查询的目标商品是电商搜索系统的主要目标之一。相关性预测对于搜索系统至关重要,因为它有助于提升性能。在线部署相关性模型时,要求模型能够进行快速且准确的推理。当前广泛使用的模型如双编码器和交叉编码器,分别在准确率或推理速度方面存在局限性。本文提出了一种名为基于实体的相关性模型(EBRM)的新方法。我们识别商品中的实体,并将查询-商品(QI)相关性问题分解为多个查询-实体(QE)相关性问题;然后通过软逻辑公式聚合结果以形成QI预测。这种分解允许我们使用交叉编码器QE相关性模块实现高准确率,同时缓存QE预测结果以实现快速在线推理。利用软逻辑使预测过程具有可解释性和可干预性。我们还证明,利用用户日志中的自动生成QE数据对QE模块进行预训练,可以进一步提升整体性能。该方法在电商网站标注数据上进行了评估。实验结果表明,它在保持计算效率的同时取得了显著的改进。