In e-commerce search, relevance between query and documents is an essential requirement for satisfying user experience. Different from traditional e-commerce platforms that offer products, users search on life service platforms such as Meituan mainly for product providers, which usually have abundant structured information, e.g. name, address, category, thousands of products. Modeling search relevance with these rich structured contents is challenging due to the following issues: (1) there is language distribution discrepancy among different fields of structured document, making it difficult to directly adopt off-the-shelf pretrained language model based methods like BERT. (2) different fields usually have different importance and their length vary greatly, making it difficult to extract document information helpful for relevance matching. To tackle these issues, in this paper we propose a novel two-stage pretraining and matching architecture for relevance matching with rich structured documents. At pretraining stage, we propose an effective pretraining method that employs both query and multiple fields of document as inputs, including an effective information compression method for lengthy fields. At relevance matching stage, a novel matching method is proposed by leveraging domain knowledge in search query to generate more effective document representations for relevance scoring. Extensive offline experiments and online A/B tests on millions of users verify that the proposed architectures effectively improve the performance of relevance modeling. The model has already been deployed online, serving the search traffic of Meituan for over a year.
翻译:在电商搜索中,查询与文档的相关性是满足用户体验的基本要求。与提供商品的传统电商平台不同,美团等生活服务平台用户主要搜索产品提供商,这类文档通常包含丰富的结构化信息(如名称、地址、类别及数千种商品)。利用这些富结构化内容建模搜索相关性面临以下挑战:(1)结构化文档不同字段间存在语言分布差异,难以直接采用BERT等现有预训练语言模型方法;(2)不同字段的重要性与长度差异显著,导致难以提取有助于相关性匹配的文档信息。为解决上述问题,本文提出一种新颖的两阶段预训练与匹配架构,用于处理富结构化文档的相关性匹配。在预训练阶段,我们提出一种有效方法,将查询与文档的多个字段作为输入,并包含针对长字段的信息压缩技术。在相关性匹配阶段,通过引入搜索查询中的领域知识生成更有效的文档表示用于相关性评分。针对数百万用户的离线实验与在线A/B测试表明,所提架构有效提升了相关性建模性能。该模型已在线部署,持续服务于美团搜索流量超过一年。